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ORIGIN OF REPLICATION COMPLEX GENES, 
PROTEINS AND METHODS 

INTRODUCTION 

The research carried out in the subject application was supported in part by 
grants from the National Institutes of Health. The government may have rights in 
any patent issuing on this application. 

5 

Technical Field 

The technical field of this invention concerns Origin of Replication 
Complex genes which are invovled with DNA transcription and replication. 

10 Background 

The elements involved in the early events of eukaryotic DNA replication 
have begun to emerge in the yeast Saccharomyces cerevisiae, A critical first step 
was the identification of ARS elements derived from yeast chromosomes, a subset 
of which were subsequently shown to act as chromosomal origins of DNA 

15 replication (reviewed in 11). Sequence comparison of a number of ARS elements 
resulted in the identification of the ARS consensus sequence (ACS, 12). This 
sequence is essential for the function of yeast origins of DNA replication (7, 12, 
13). Three additional elements required for efficient ARSl function have been 
identified. When mutated individually, these elements, referred to as Bl, B2, and 

20 B3, result in a slight reduction of ARSl activity. When two or three of the B 
elements are simultaneously mutated, however, ARSl function is severely 
compromised (14). 
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Proteins that recognize two elements of ARSl have been identified. The 
yeast transcription factor ABFl binds to and mediates the function of the B3 
element (11, 14). More recently we have identified a multi-protein complex that 
specifically recognizes the highly conserved ACS (15). This activity, referred to 
5 as the origin recognition complex (ORC), has several properties that make it an 
attractive candidate to act as an initiator protein at yeast origins of replication. 
Binding of this protein requires the ACS, and the effect of mutations in the 
consensus sequence on ARS 1 function parallels the effect of the same mutations on 
ORC DNA binding. ORC binds to more than 10 yeast ARS elements, several of 

10 which are known origins of DNA replication (15). Specific DNA binding by ORC 
requires ATP, suggesting that ORC binds ATP, a property of a number of known 
initiator proteins (17), ORC also interacts with other sequences outside of the ACS 
that are known to be important for ARS function (18, 19). Further support for 
the hypothesis that ORC mediates the function of the ACS is provided by in situ 

15 deoxyribonuclease I (DNase I) footprinting experiments that identify a protected 
region of ARSl remarkably similar to that observed with ORC in vitro (20). 



Relevant Literature 

A multi-protein complex that recognizes cellular origins of DNA replication 
20 was reported in Bell and Stiliman (1992) Nature 357, 128-134. Much of the 
present disclosure was published by Foss et al. (1993), Bell et al. (1993) and Li 
and Herskowicz (1993), in Science 262, 1838, 1843 and 1870, respectively, issue 
date December 17, 1993. Wang and Reed (1993) Nature 364, 121-126 report using 
a single-hybrid screen as disclosed herein. 

25 

SUMMARY OF THE INVENTION 
Origin of DNA Replication Complex (ORC) genes, recombinant ORC 
peptides and methods of identifying DNA binding proteins and using the subject 
compositions are provided. 
30 Provided are compositions comprising isolated nucleic acids encoding 

unique ORC gene portions, especially portions encoding biologically active unique 
portions of ORC1-ORC6 proteins. Vectors and cells comprising such DNA 
molecules find use in the production of recombinant ORC peptides. 



2 
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The subject compositions are used to isolate ORC genes from a wide 
variety of species, including human. The subject ORC peptides also find particular 
use in screening for ORC selective agents useful in the diagnosis, prognosis or 
treatment of disease, particulary fungal infections and neoproliferative disease. 
5 Particularly useful are agents capable of distinguishing an ORC protein of an 
infectious organism or transformed cell from the wild-type human homologue. 

Also disclosed are methods for identifying a gene encoding a protein which 
directly or indirectly associates with a selected DNA sequence. Generally, the 
methods involve transforming an expression library of hybrid proteins into a 

10 reporter strain, wherein the library comprises protein-coding sequences fused to a 
constitutively expressed transcription activation domain and the reporter strain 
comprises a reporter gene with at least one copy of a selected DNA sequence in its 
promoter region. Clones expressing the transcription or translation product of the 
reporter gene are detected and recovered. A preferred method employs an 

15 activation domain from GAL4 and a lacZ reponer gene. 



BREIF DESCRIPTION OF SEQUENCE ID NUMBERS 
SEQUENCE ID N0:1. DNA Sequence of ORCl. 
SEQUENCE ID N0:2. Amino Acid Sequence of ORCl. 
' 20 SEQUENCE ID N0:3. DNA Sequence of ORCl. 

SEQUENCE ID N0:4. Amino Acid Sequence of 0RC2. 
SEQUENCE ID N0:5. DNA Sequence of 0RC3. 
SEQUENCE ID NO: 6. Amino Acid Sequence of 0RC3. 
SEQUENCE ID N0:7. DNA Sequence of ORC4. 
25 SEQUENCE ID NO: 8. Amino Acid Sequence of 0RC4. 
SEQUENCE ID N0:9. DNA Sequence of 0RC5. 
SEQUENCE ID NO: 10. Amino Acid Sequence of 0RC5. 
SEQUENCE ID NO: II. DNA Sequence of 0RC6. 
SEQUENCE ID NO: 12. Amino Acid Sequence of 0RC6. 

30 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
The recombinant polypeptides of the invention comprise unique portions of 
the disclosed ORC proteins which retain an binding affinity specific to the subject 

3 
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full-length ORC protein. A "unique portion" has an amino acid sequence unique to 
subject ORC in that it is not found in previously known protein and has a length at 
least long enough to define a peptide specific to that ORC. Unique portions are 
found to vary from about 5 to about 25 residues, usually from 5 to 10 residues in 
5 length, depending on the particular amino acid sequence and are readily identified 
by comparing the subject portion sequences with known peptide/protein sequence 
data bases. Hence, the term polypeptide as used herein defines an amino acid 
polymer with as few as five residues. ORCs used in the subject screening assays 
are frequently smaller deletion mutants of full-length ORC proteins. Typically, 

10 such deletion mutants are readily generated using conve^itional molecular 

techniques and screened for an ORC-specific binding affinity using the various 
assays described below, e.g. footprint analysis, coimmunoprecipitation, etc. 

ORC-specific retained binding affinities include the ability to selectively 
bind a nucleic acid of a defined sequence, an ORC protein or an compound such as 

15 an antibody which is capable of selectively binding an ORC protein. As such, 
binding specificity may be provided by an ORC-specific immunological epitope, 
lectin binding site, etc. Selective binding is conveniently shown by competition 
with labeled ligand using recombinant ORC peptide either in vitro or in cell based 
systems as disclosed herein. Generally, selective binding requires a binding 

20 affinity of 10"^M, preferably lO'^M, more preferably 10**^, under in vitro 
conditions as exemplified below. 

The subject recombinant polypeptides may be free or covalently coupled to 
other atoms or molecules. Frequently the polypeptides are present as a portion of 
a larger polypeptide comprising the subject polypeptide where the remainder of the 

25 larger polypeptide need not be ORC-derived. The subject polypeptides are 

typically "isolated", meaning unaccompanied by at least some of the material with 
which they are associated in their natural state. Generally, an isolated polypeptide 
constitutes at least about 1%, preferably at least about 10%, and more preferably at 
least about 50% by weight of the total poly /peptide in a given sample. By pure 

30 peptidepolypeptide is intended at least about 60%, preferably at least 80%, and 
more preferably at least about 90% by weight of total polypeptide. Included in the 
subject polypeptide weight are any atoms, molecules, groups, etc. covalently 
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coupled to the subject polypeptides, such as detectable labels, glycosylations, 
phosphorylations, etc. 

The subject polypeptides may be isolated or purified in a variety of ways 
known to those skilled in the art depending on what other components are present 
5 in the sample and to what, if anything, the polypeptide is covalently linked. 
Purification methods include electrophoretic, molecular, immunological and 
chromatographic techniques, especially affinity chromatography and RP-HPLC in 
the case of peptides. For general guidance in suitable purification techniques, see 
Scopes, R., Protein Purification, Springer- Verlag, NY (1982). 

10 The polypeptides may be modified or joined to other compounds using 

physical, chemical, and molecular techniques disclosed or cited herein or otherwise 
known to those skilled in the relevant art to affect their ORC/receptor binding 
specificity or other properties such as solubility, membrane transportability, 
stability, toxicity, bioavailability, localization, detectability, in vivo half-life, etc. 

15 as assayed by methods disclosed herein or otherwise known to those of ordinary 
skill in the art. Other modifications to further modulate binding specificity/affinity 
include chemical/enzymatic intervention (e.g. fatty acid-acylation, proteolysis, 
glycosylation) and especially where the poly/peptide is integrated into a larger 
polypeptide, selection of a particular expression host, etc. Amino and/or carboxyl 

20 termini may be functional ized e.g., for the amino group, acylation or alkylation, 
and for the carboxyl group, esterification or amidification, or the like. 

Many of the disclosed poly/peptides contain glycosylation sites and patterns 
which may be disrupted or modified, e.g. by enzymes like glycosidases. For 
instance, N or O-linked glycosylation sites of the disclosed poly/peptides may be 

25 deleted or substituted for by another basic amino acid such as Lys or His for N- 
linked glycosylation alterations, or deletions or polar substitutions are introduced at 
Ser and Thr residues for modulating O-linked glycosylation. Glycosylation 
variants are also produced by selecting appropriate host cells, e.g. yeast, insect, or 
various mammalian cells, or by in vitro methods such as neuraminidase digestion. 

30 Other covalent modifications of the disclosed poly/peptides may be introduced by 
reacting the targeted amino acid residues with an organic derivatizing (e.g. methyl- 
3-[(p-a2ido-phenyl)dithio] propioimidate) or crosslinking agent (e.g. 1,1- 
bis(diazoacetyI)-2-phenylethane) capable of reacting with selected side chains or 
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termini. For therapeutic and diagnostic localization, the subject poly/peptides 
thereof may be labeled directly (radioisotopes, fluorescers, etc.) or indirectly with 
an agent capable of providing a detectable signal, for example, a heart muscle 
kinase labeling site. 

5 ORC poypeptides with ORC binding specificity are identified by a variety 

of ways including crosslinking, or preferably, by screening such polypeptides for 
binding to or disruption of ORC-ORC complexes. Additional ORC-specific agents 
include specific antibodies that can be modified to a monovalent form, such as Fab, 
Fab', or Fv, specifically binding oligopeptides or oligonucleotides and most 

10 preferably, small molecular weight organic compounds. For example, the 

disclosed ORC peptides are used as immunogens to generate specific polyclonal or 
monoclonal antibodies. See, Harlow and Lane (1988) Antibodies, A Laboratory 
Manual, Cold Spring Harbor Laboratory, for general methods. 

Other prospective ORC specific agents are screened from large libraries of 

15 synthetic or natural compounds. Alternatively, libraries of natural compounds in 
the form of bacterial, fungal, plant and animal extracts are available or readily 
producible. Additionally, natural and synthetically produced libraries and 
compounds -are readily modified through conventional chemical, physical, and 
biochemical means. See, e.g. Hough ten et al. and Lam et al (1991) Nature 354, 

20 84 and 81, respectively and Blake and Litzi-Davis (1992), Bioconjugate Chem 3, 
510. 

Useful agents are identified with assays employing a compound comprising 
the subject polypeptides or encoding nucleic acids. A wide variety of in vitro, 
cell-free binding assays, especially assays for specific binding to immobilized 

25 compounds comprising ORC polypeptide find convenient use. For example, 

immobilized ORC-ORC or ORC-nucleic acid complexes provide convenient targets 
for disruption, e.g. as measured by the disassociation of a labelled component of 
the complex. Such assays are amenable to scale-up, high throughput usage suitable 
for volume drug screening. While less preferred, cell-based assays may be used to 

30 determine specific effects of prospective agents. 

Preferred agents are ORC- and species-specific. Useful agents may be 
found within numerous chemical classes, though typically they are organic 
compounds; preferably small organic compounds. Small organic compounds have 
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a molecular weight of more than 150 yet less than about 4,500, preferably less 
than abput 1500, more preferably, less than about 500. Exemplary classes include 
steroids, heterocyclics, polycyclics, substituted aromatic compounds, and the like. 
Selected agents may be modified to enhance efficacy, stability, 
5 pharmaceutical compatibility, and the like. Structural identification of an agent 
may be used to identify, generate, or screen additional agents. For example, 
where peptide agents are identified, they may be modified in a variety of ways as 
described above, e.g. to enhance their proteolytic stability. Other methods of 
stabilization may include encapsulation, for example, in liposomes, etc. The 
10 subject binding agents are prepared in any convenient way known to those in the 
art. 

For therapeutic uses, the compositions and agents disclosed herein may be 
administered by any convenient way. Small organics are preferably administered 
orally; other compositions and agents are preferably administered parenterally, 

15 conveniently in a pharmaceutically or physiologically acceptable carrier, e.g., 

phosphate buffered saline, or the like. Typically, the compositions are added to a 
retained physiological fluid. As examples, many of the disclosed therapeutics are 
amenable to direct injection or infusion, topical, intratracheal/nasal administration 
e.g. through aerosal, intraocularly, or within/on implants e.g. collagen, osmotic 

20 pumps, grafts comprising appropriately transformed cells, etc. Generally, the * 
amount administered will be empirically determined, typically in the range of about 
10 to 1000 Mg/kg of the recipient. For peptide agents, the concentration will 
generally be in the range of about 50 to 500 ^g/m\ in the dose administered. 
Other additives may be included, such as stabilizers, bactericides, etc. These 

25 additives will be present in conventional amounts. 

The invention provides isolated nucleic acids encoding ORC genes, their 
transcriptional regulatory regions and the disclosed unique ORC polypeptides which 
retain ORC-specific function. As used herein: an "isolated" nucleic acid is present 
as other than a naturally occurring chromosome or transcript in its natural state and 

30 is typically joined in sequence to at least one nucleotide with which it is not 
normally associated on a natural chromosome; nucleic acids with substantial 
sequence similarity hybridize under low stringency conditions, for example, at 
50°C and SSC (0.9 M saline/0.09 M sodium citrate) and remain bound when 
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subject to washing at 55°C with SSC, wherein regions of non-identity of 
substantially similar nucleic acid sequences preferably encode redundant codons; a 
partially pure nucleotide sequence constitutes at least about 5 % , preferably at least 
about 30%, and more preferably at least about 90% by weight of total nucleic acid 
5 present in a given fraction; unique portions of the disclosed nucleic acids are of 
length sufficient to distinguish previously known nucleic acids, hence a unique 
portion has a nucleotide sequence at least long enough to define a novel 
oligonucleotide, usually at least about 18 bp in length, preferably at least about 36 
nucleotides in length. 

10 Typically, the invention's ORC polypeptide encoding polynucleotides are 

associated with heterologous sequences. Examples of such heterologous sequences 
include regulatory sequences such as promoters, enhancers, response elements, 
signal sequences, polyadenylation sequences, etc., introns, 5' and 3' noncoding 
regions, etc. According to a particular embodiment of the invention, portions of 

15 the coding sequence are spliced with heterologous sequences to produce soluble, 
secreted fusion proteins, using appropriate signal sequences and optionally, a 
fusion partner such as |S-Gal. For antisense applications where the inhibition of 
expression is indicated, especially useful oligonucleotides are between about 10 and 
30 nucleotides in length and include sequences surrounding the disclosed ATG start 

20 site, especially the oligonucleotides defined by the disclosed sequence beginning 
about 5 nucleotides before the start site and ending about 10 nucleotides after the 
disclosed start site. The ORC encoding nucleic acids can be subject to alternative 
purification, synthesis, modification, sequencing, expression, transfection, 
administration or other use by methods disclosed in standard manuals such as 

25 Current Protocols in Molecular Biology (Eds. Aufubel, Brent, Kingston, More, 
Feidman, Smith and Stuhl, Greene Publ. Assoc., Wiley-Interscience, NY, NY, 
1992) or that are otherwise known in the art. 

The invention also provides vectors comprising the described ORC nucleic 
acids. A large number of vectors, including plasmid and viral vectors, have been 

30 described for expression in a variety of eukaryotic and prokaryotic hosts. 

Advantageously, vectors will often include a promotor operably linked to an ORC 
polypeptide-encoding portion, one or more replication systems for cloning or 
expression, one or more markers for selection in the host, e.g. antibiotic 
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resistance. The inserted coding sequences may be synthesized, isolated from 
natural sources, prepared as hybrids, etc. Suitable host cells may be 
transformed/transfected/infected by any suitable method including electroporation, 
CaClj mediated DNA uptake, viral infection, microinjection, microprojectile, or 
5 other methods. 

Appropriate host cells include bacteria, archebacteria, fungi, especially 
yeast, and plant and animal cells, especially mammalian cells. Of particular 
interest are E. coli . B. subtilis , Saccharomvces cerevisiae . SF9 cells, C129 cells, 
293 cells, Neurospora, and CHO, COS, HeLa cells, immortalized mammalian 
10 myeloid and lymphoid cell lines, and pluripotent cells, especially mammalian ES 
cells and zygotes. Preferred expression systems include COS-7, 293, BHK, CHO, 
TM4, CVl, VERO-76, HELA, MDCK, BRL 3A, W138, Hep G2, MMT 060562, 
TRI cells, and baculovirus systems. Preferred replication systems include M13, 
ColEl, SV40, baculovirus, lambda, adenovirus, AAV, BPV, etc. A large number 
15 of transcription initiation and termination regulatory regions have been isolated and 
shown to be effective in the transcription and translation of heterologous proteins in 
the various hosts. Examples of these regions, methods of isolation, manner oiF 
manipulation, etc. are known in the art. 

For the production of stably transformed cells and transgenic animals, the 
20 subject nucleic acids may be integrated into a host genome by recombination 
events. For example, such a nucleic acid can be electroporated into a cell, and 
thereby effect homologous recombination at the site of an endogenous gene, an 
analog or pseudogene thereof, or a sequence with substantial identity to an ORC- 
encoding gene. Other recombination-based methods such as nonhomologous 
25 recombinations, deletion of endogenous gene by homologous recombination, 
especially in pluripotent cells, etc., provide additional applications. Preferred 
transgenics and stable transformants over-express or under-express (e.g. knock-out 
cells and animals) a disclosed ORC gene and find use in drug development and as 
a disease model. Methods for making transgenic animals, usually rodents, from 
30 ES cells or zygotes are known to those skilled in the art. 

The compositions and methods disclosed herein may be used to effect gene 
therapy. See, e.g. Zhu et al. (1993) Science 261, 209-211; Gutierrez et al. (1992) 
Lancet 339, 715-721. For example, cells are transfected with ORC-encoding 
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sequences operably linked to gene regulatory sequences capable of effecting altered 
ORC expression or regulation. To modulate ORC translation, target cells may be 
transfected with complementary antisense polynucleotides. For gene therapy 
involving the grafting/implanting/transfusion of transfected cells, administration 
5 will depend on a number of variables that are ascertained empirically. For 

example, the number of cells will vary depending on the stability of the transfered 
cells. Transfer media is typically a buffered saline solution or other 
pharmacologically acceptable solution. Similarly the amount of other administered 
compositions, e.g. transfected nucleic acid, protein, etc., will depend on the 

10 manner of administration, purpose of the therapy, and the like. 

The genes encoding six ORC subunits from 5. cerevisiae are used to obtain 
the functional homologues of the ORC proteins from other species. For example, 
we have demonstrated that the ORCl gene is conserved in a related fungi 
klyuermyces lactis. The ORCl gene in both 5. cerevisie and k laciis contain 

15 conserved primary protein sequence that are utliized to obtain the ORCl gene from 
other species including other fungi and from human. Using oligonucleotide 
primers based on the conserved sequences between 5. cerevisiae and k lactis, PCR 
is used to identify the ORCl protein in any eukaryotic species. The cloned gene 
encoding ORCl polypeptide from any fungi or from human cells is used to express 

20 the protein in a bacterial expression system to make antibodies against the 

polypeptide. These antibodies are used to immunoprecipitate the ORC complex 
from the relevant species. Using the disclosed techniques for protein sequencing, 
the sequence the ORC polypeptides is obtained. Using the protein sequencing 
methodologies disclosed herein for cloning the S. cerevisiae protein, other genes or 

25 cDNAS encoding the ORC polypeptides from other fungi species and from human 
cells are obtained. As we demonstrate herein how to reconstitute the ORC 
complex by expressing each of the 5. cerevisiae genes in a baculovirus expression 
vector and infecting Sf 9 insect cells with viruses expressing each of the ORC 
subunits, these genes are used to express the ORC polypeptides and reconstitute 

30 activity. In this way, large amounts of ORC protein from any fungi or 
mammalian species, including human cells, are obtained. 

Inhibitors of ORC protein in fungi provide valuable reagents to selectively 
inhibit proliferation of fungal cell division by inhibiting the initiation of DNA 
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replication. This offers a powerful, selective target for antifungal agents valuable 
in controlling fungal infections in human and other species. For example, as 
disclosed herein, inhibiting the ORG function by mutation in S, cerevisiae can 
actually cause the death of the mutant cells. 
5 In human proliferative disorders such as cancer, cells of the diseased tissue 

undergo uncontrolled cell proliferation. A key event in this cell proliferation is the 
initiation of DNA replication. Inhibiting the initiation of DNA replication through 
inhibition of ORG function provides a valuable target for inhibitors of cell growth. 
By expressing each of the cDNAS encoding the ORG proteins, either individually 

10 or together in an expression system, ORG function is reconstituted in vitro. Using 
this recombinant, expressed protein, inhibitors of ORG function are obtained that 
block the initiation of DNA replication in cell cycle. As described above, small 
molecular inhibitors of ORG DNA binding or other activities provide valuable 
reagents as anti-cancer and anti-proliferation drugs. 

15 The following examples are offered by way of illustration and not by way 

of limitation. 



EXAMPLES 

Example 1. 

20 Transcriptional silencing and ORG. 

The binding of purified ORG to the ARS consensus sequence (AGS) at each 
of the mating type silencers was tested using a DNase I protection assay (22). 
ORG protected the match to the AGS at each of the four silencers in an ATP 
dependent manner. In addition, at each silencer characteristic hypersensitive sites 

25 of DNAse I cleavage were observed initiating 12-13 bp from the AGS and 
extending away from the consensus sequence at approximately 10 bp intervals. 
This pattern of DNase I protection and enhanced cleavage is nearly identical to that 
observed at non-silencer sequences and indicates that ORG binding to these 
elements is not fundamentally different from its binding at other ARS elements. 

30 At HML-E, HML-I, and HMR-E the only protection observed included the 

ACS. At HMR-I, however, we observed a second unexpected footprint that did 
not overlap a strong match to the AGS. Moreover, unlike all previous sites bound 
by ORG, this protection showed little dependence upon the addition of ATP to the 
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binding reaction. Although there are two partial matches to the ACS in this 
region, similar sequences in other ARS elements and silencers were not recognized 
by ORC, suggesting that these sequences did not direct this unusual ATP- 
independent binding of ORC to DNA. In combination with the protection observed 
5 at the ACS, the boundaries of the ORC footprint at HMR-I were very similar to 
the boundaries of HMR-I defined by deletion mutagenesis (23). These experiments 
demonstrate that ORC binds all four of the mating- type silencers, that ORC can 
bind sequences other than the ACS and that it plays an important role at HML and 
HMR. 

10 A clear link between ORC function and transcriptional silencing was 

provided by the finding that a mutation in a gene encoding a subunit of ORC was 
defective for repression at HMR (below). To clone the genes encoding the various 
ORC subunits, peptides derived from each of the ORC subunits were sequenced 
(24), A candidate gene, referred to as 0RC2, was isolated by complementation of 

15 a temperature sensitive mutation that showed silencing defects at the permissive 
temperature. Genetic experiments suggested that 0RC2 mediated the silencing 
function of the ACS at HMR-E, making it a good candidate to encode a subunit of 
ORC (below). Comparison of the predicted amino acid sequence of 0RC2 showed 
that all of the peptides derived from the 72 kd subunit of ORC were within the 

20 open reading frame of the ORC2 gene indicating that it encoded the second largest 
subunit of ORC. 

0RC2 mutations alter ORC function in vitro. 

To address the effect of 0RC2 mutations on ORC function in vitro, extracts 
were prepared from both orc2-l and 0RC2 strains (25). Fractions derived from 

25 wild-type cells showed strong ORC DNAse I protection over the ACS and Bl 
elements of ARSl in DNAse I footprinting. In contrast, fractions derived from 
orc2-l cells showed a dramatic reduction in ORC DNA binding activity. The ACS 
and the Bl element were no longer protected from DNase I cleavage. Only the 
characteristic enhanced DNase I cleavages in the B domain of ARSl remained. 

30 Mutations that disrupt ORC DNA binding at ARSl prevented the residual DNA 
binding observed with the mutant fractions, indicating that this binding required the 
ACS. The DNA binding defects were also not due to a general inhibition of DNA 
binding as mixing of mutant and wild type fractions did not reduce binding of the 

12 
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wild type protein. Incubation of the mutant cells at the non-permissive temperature 
was not necessary to observe defects in ORC DNA binding, which explains the 
defect observed in mating-type regulation at the permissive temperature (below). 
To investigate the polypeptide composition of ORC derived from orc2-l 
5 and 0RC2 cells, immuno-blots of these fractions were probed with polyclonal 
antibodies raised against ORC. 30 iig of partially purified ORC derived from 
either JRY3688 {0RC2) or JRY3687 {orc2-l) was separated on a 10% SDS- 
polyacrylamide gel and transferred to nitrocellulose. The resulting protein blot was 
incubated with polyclonal mouse sera raised against the entire ORC complex. This 

10 sera detects all but the 50 kd subunit of ORC. Antibody-antigen complexes were 
detected with horseradish peroxidase conjugated secondary antibodies followed by 
incubation with a chemiluminescent substrate. 

Wild type fractions contained the 120, 72, 62, 56, and 53 kd subunits of 
ORC in roughly equal quantity. The mutant fractions, however, showed a 

15 distinctly different subunit composition. While the amount of the 120 and 56 kd 
subunits was only slightly reduced relative to the wild type fraction, the amount of 
the 72, 62, and 53 kd subunits was reduced dramatically. In UV cross-linking 
experiments the same three subunits are specifically cross-linked to DNA in an 
ACS and ATP dependent manner, suggesting an important role for one or more of 

20 these subunits in ORC DNA binding (15). Thus, the absence of these subunits 
explains the defects in DNA binding observed in vitro and indicates that the orc2'l 
mutation results in .a reduction of ORC stability or a defect in Orc2p also results in 
reduced DNA binding of an intact ORC complex. 
orc2-l cells are defective for entry into S-phase. 

25 The point in the cell cycle the essential function of 0RC2 is performed in 

vivo was investigated using alpha factor and hydroxyurea (HU) as cell cycle 
landmarks (26). Our results were consistent with the execution of the essential 
function of Orc2p between late Gl and the initiation of DNA synthesis. Arrest 
with HU followed by release into the non-permissive temperature resulted in 89% 

30 of the cells completing an additional cell cycle, indicating that the essential function 
for Orc2p was executed before the HU arrest point in the cell cycle. In contrast, 
blocking the cell cycle with alpha-factor followed by release at the non-permissive 
temperature resulted in the only 41 % of the cells completing an additional cell 
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cycle. This phenotype indicates that the Orc2p function was performed at or near 
the Gl-S phase boundary. 

To address the role of ORG in yeast DNA replication more directly, the 
. DNA content of asynchronous cultures of either orc2'] or isogenic wild type cells 
5 was measured at various times after shifting from the permissive to the non- 
permissive temperature by fluorescent cytometric analysis (27). JRY3687 (orc2-l) 
or JRY3688 (ORC2) cells grown at 24 ""C (0 minute time point) or at various times 
after shifting to the non-permissive temperature (SV^C) were fixed, stained with 
propidium iodide, and analyzed for DNA content using a Coulter Model Epics-C 

10 Flow Cytometer. In addition, a small number of cells (approximately 1000) from 
each time point were returned to the permissive temperature to determine the 
percentage of cells that remained viable at a given time point. Initially, the DNA 
content of both wild type and mutant cells was equally divided between IC and 2C 
with approximately 10% of the cells in S phase. At early time points after the 

15 temperature shift (15-70 minutes) there was a dramatic loss of orc2-l cells in S- 
phase suggesting that entry into S-phase had been halted. Consistent with this 
hypothesis, as the time course continued the orc2-l mutant showed a rapid 
accumulation of cells with a IC DNA content and a commensurate decrease in 
cells with a 2C DNA content (50-100 minutes). Between 100 and 120 minutes, a 

20 new population of orc2-l cells was observed that appeared to enter into a delayed S 
phase. By 150 minutes the bulk of the mutant cells were in this population and 
after 180 minutes only a few cells remained with a IC DNA content. 

Interestingly, we observed a strong correlation between entry into the new 
round of DNA synthesis and a loss of orc2-l cell viability. Similar experiments 

25 with isogenic ORC2 cells showed that these effects were specific to the orc2'l 

mutation. These findings indicate that at the non-permissive temperature the orc2' 
1 cells were initially unable to enter S phase, but later entered into an abortive 
round of DNA replication. Entry into this type of replication appears to be a lethal 
event. Overall, the analysis of the orc2-J mutation provides in vivo evidence 

30 showing that ORC acts early in S-phase in general, and as the initiator protein at 
yeast origins of replication in particular. 
Identification of the ORC6 gene. 
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A second gene that represented a strong candidate to encode one of the 
subunits of ORC was the AAPI gene. This gene was cloned using a novel screen 
for proteins that bound to the ACS in vivo (below). When compared to the 
predicted amino acid sequence of this gene, we found that all of the peptides 
5 derived from the 50 kd subunit of ORC were encoded by the open reading frame 
of the AAPI gene (28). For this reason we now refer to AAPI as 0RC6 , as it 
encodes the smallest of the six ORC subunits. The identification of this gene as a 
subunit of ORC provides direct evidence that ORC is bound to the ACS in vivo. 
Numbered Citations for Introduction and Example 1 
10 1. Callan, Cold Spring Harbor Symp. Quant. Biol. 38, 195-203 (1973). 

2. Fangman and Brewer, Cell 71, 363-366 (1992). 
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13. S. P. Bell and B. Stillman, Nature 357, 128-134 (1992). 

25 14. Romberg & Baker, DNA Replicat'n (Freeman & Co, NY, 1991) v2. 

15. C. S. Newlon, Microbiol. Rev. 52, 568-601 (1988). 

16. Newlon and Theis, Current opinion in gen. and dev. 3, (1993). 
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18. Jacob, etal., CSH Symp. Quant. Biol.. 28, 329-348 (1963). 

30 19. DNAse I footprinting was performed as previously described (15). 

20. J. B. Feldman,etal., J. Mol. Biol. 178, 815-834 (1984). 

21. To obtain sufficient protein for peptide sequencing, a revised purification 
procedure for ORC was devised. Whole cell extract was prepared from 400g of 
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frozen Bj926 cells using a bead beater (Biospec Products) until greater than 90% 
breakage was achieved. One twelfth volume of a saturated (at 4*'C) solution of 
ammonium sulfate was added to the broken cells and stirred for 30 minutes. This 
solution was then spun at 13,000 x g for 20 minutes. The resulting supernatant 
5 was spun in a 45Ti rotor (Beckman) at 44,000 RPM for 1.5 hrs. 0.27g/ml of 
ammonium sulfate was added to the resulting supernatant, and the resulting 
precipitate was collected by spinning in the 45 Ti rotor at 40,000 RPM for 30 
minutes. The resulting pellet was resuspended in buffer H/0.0 (15) and dialyzed 
versus H/0. 15M KCl (H with 0. 15 M KCl added). Preparation of ORC from this 

10 extract was similar to (15) with the following changes. The dsDNA cellulose 
column was omitted from the preparation and only a single glycerol gradient was 
performed. Sequencing of peptides derived from ORC subunits was performed 
using a modification of an "in gel" protocol described previously (40, 41). 
Purified ORC ( — 10 ftg per subunit) was separated by SDS-PAGE and stained with 

15 0.1% Coomassie Brilliant Blue G (Aldrich). After destaining the gel was soaked 
in water for one hour. The protein bands were excised, transferred to a 
microcentrifuge tube and treated with 200 ng of Achromobacm protease I 
(Lysylendopeptidase: Wako). The resulting peptides were separated by reverse- 
phase chromatography and sequenced by automated Edman degradation (Applied 

20 Biosystems model 470). 

22. To isolate and assay ORC from ORC2 and orc2-l cells four liters of 
JRY3687 iorc2'l, MATz, hmrDA::TRPJ ade2 his3 leu2 trpl ura3) or the isogenic 
wild-type strain JRY3688 {0RC2 A//47a, hmrDA::TRPl ade2 his3 leu2 trpl ura3) 
were grown to a density of 2 x 10^ cells per ml. Extracts were prepared as 

25 described (24) and fractionated over the first two columns in the preparation of 
ORC. The peak fraction of ORC DNA binding activity eluted from the Q- 
. Sepharose (Pharmacia) column of each preparation was used for subsequent 
analysis. Antibodies were raised against the entire ORC complex using a single 
mouse. The resulting sera was able to recognize all but the 50 kd subunit of ORC. 

30 Proteins were transferred to nitrocellulose and antigen-antibody complexes were 
detected with horse radish peroxidase conjugated secondary anitbodies and a 
chemiluminescent substrate. 
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23. Yeast cells were grown to a density of 1-4 x 10"^ cells per ml at 24°C then 
diluted to a density of 2-4 x 10* cells per ml into YPD containing 6 alpha- 
factor and incubated for 2-2.5 hours at 24°C (> 90% unbudded cells). For the 
hydroxyurea arrest experiments alpha factor was washed away and the cells were 

5 resuspended in YPD containing 100 mM hydroxyurea and incubated an additional 
2.5 hours (> 90% large budded cells). After incubation with the growth 
inhibitor, cells were briefly sonicated and plated on YPD plates pre-incubated at 
either 24**C or ST^'C and observed at 0, 3, and 6 hours after plating. 

24. Yeast cells were grown to a density of 1-4 x lO'' cells per ml at 24°C and 
10 diluted into fresh YPD at either 37°C or 24*^0 and a density of 2-4 x 10* cells per 

ml. At times after dilution, 3 x 10* cells were processed as described (42). 

25. The position of the five peptides derived from the 50 kd subunit of ORG in 
the 0RC6 gene were residues: 51-65; 91-102; 110-105; 207-226; 424-430. 
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Example 2. 

0RC2, a gene required for viability and silencing 
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In a mutant screen, a temperature-sensitive mutation called orc2-l was 
isolated that, at the permissive temperature, resulted in derepression of HMRa 
flanked by the synthetic silencer and did not cause derepression of HMRd. flanked 
by the wild-type silencer (20). Because the orcl-l mutant was temperature- 
5 sensitive and silencing defective, it merited further analysis. The temperature 
resistance of a heterozygus orc2-l/ORC2 diploid (JRY2640) established that the 
mutation was recessive. The diploid was transformed with a plasmid containing 
HMRdi flanked by a mutant silencer (pJR1212), to provide MATslI function 
required for sporulation. The temperature-sensitive growth phenotype segregated 2 

10 ts : 2 wild type in each of 23 tetrads, indicating that it was caused by a single 
nuclear mutation. An HMLa matdil HMRa orc2-l segregant (JRY3683) was 
obtained from the diploid following sporulation. 

Genetic crosses were used to determine which features in the wild-type 
silencer distinguished it from the synthetic silencer with respect to derepression by 

15 orc2-l. A marzl HMRa strain (JRY3683) containing the orc2-l mutation was 

mated to a MATa strain containing a mutation in the RAPl binding site of HMR-E 
flanking HMRa. (the HMRdi-e-rapl-lO allele; 5401-la) to determine whether orc2'l 
could derepress HMR2L in the absence of a functional RAPl binding site. All 29 of 
the 96 MATa segregants that had little or no mating ability were temperature- 

20 sensitive for growth. Nineteen of the MATa temperature-sensitive segregants were 
mating competent, indicating that the orc2-l mutation per se was insufficient to 
disrupt mating ability, and suggesting that the HMRdL-e-rapl-lO allele was required 
in combination with orc2-l to block mating ability of a strains. A MATa 
temperature-sensitive segregant from this cross, which mated weakly as an a 

25 (JRY4133), was confirmed to have the genotype MATa HMRa-e-rapl-lO orc2'L 
As further evidence that orc2-l in combination with HMRdL-e-rapl-lO 
blocked the mating ability of MATa strains, a somewhat unusual cross was used to 
simplify the previous cross by having orc2'l as the only relevant heterozygous 
marker. Two MATa HMRz-e-rapl-10 strains (JRY4 133 and JRY4 132) had 

30 complementary auxotrophic markers, allowing for the selection of the rare 

MATalMATa diploid formed by a mating event between these two strains. This 
diploid was able to sporulate due to the low level of expression of HMRa. in the 
diploid caused by the RAPl -site mutation in the HMR-E silencer (21). One of 



wo 95/16694 PCT/US94/ 14563 

these strains had the orcl-l mutation (JRY4133) and the other did not. As 
expected, the temperature sensitivity segregated 2:2 in each of 34 tetrads. All of 
the temperature-resistant segregants (two per tetrad) exhibited the a mating 
phenotype, and all of the temperature-sensitive segregants were either very weak 
5 a-maters or were unable to mate at all. The absence of any recombinants between 
the temperature sensitivity and mating phenotype placed the gene(s) responsible for 
the temperature sensitivity and the mating defect less than 1.5 centimorgans apart, 
providing strong evidence that a lesion in a single gene was responsible for both 
phenotypes. This result was in agreement with the co-reversion of the ts and 
10 mating phenotypes described herein. 

Isolation of multiple alleles of 0RC2 

Using the information from this analysis of orcl-l, a second screen was 
performed to identify additional mutations in essential genes with a role in silencer 
function. This second screen produced 50 mutants that were temperature sensitive 

15 for growth, and in which HMRol (flanked by a mutation in the RAPl -binding site) 
was derepressed at a semi-permissive temperature. Complementation tests for both 
growth at 37*^0 and for mating phenotype were performed between orcl-l and the 
collection of temperature-sensitive mutants from the second screen. The collection 
of temperature sensitive mutants had the matdil stel4 genotype, but were able to 

20 mate as a's due to the derepression of HMRot. These mutants were mated to a 
matdil orc2-l strain (JRY3683) and the diploids were tested for growth at 37°C. 
All but three diploids were able to grow at the restrictive temperature. The three 
temperature-sensitive diploids were each presumed to be orcllorcl homozygotes 
due to the inability of the two mutations to complement one another. The mating 

25 type of the diploids was checked to determine whether the defect in repression of 
HMR was complemented. All three diploids mated as a's. Thus, the three 
mutants were unable to complement either the temperature sensitivity or the mating 
phenotype of the original orcl-l mutation. The new mutations (in strains 
JRY4136, 4137 and 4138) were designated orc2-2, orc2'3, and orc2-4. 

30 To investigate the possibility that the new mutations were in a gene other 

than 0RC2 yet still failed to complement orc2-l, the allelism between orc2-l and 
orc2-3 was tested. The original matdil orc2'3 stel4 mutant was cured of its 
HMRoL plasmid, creating JRY 4137, and mated with a MATot HMRdi-e-rapl-lO 
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orc2-l strain (JRY3685). In 24 tetrads from this diploid, all segregants were 
temperature sensitive for growth, indicating strong linkage between orc2-l and 
orc2-3 (<2 centimorgans). All further studies were performed using the orc2-l 
allele, which provided the stronger mutant phenotypes. 
5 Map position of 0RC2 

Linkage between ORC2 and LYS2, on chromosome II, was evident in 
crosses between two lys2 strains (JRY2640 and PSY152) and the original orc2-l 
isolate (JRY2903) that placed ORC2 approximately 24 centimorgans from LYS2. 
A third cross (JRY4130 x JRY4134) tested the linkage between secl8, which is 
10 centromere proximal to LYS2, and ORC2. Because both orc2-l and seel 8 are 
temperature sensitive, an 0RC2 allele marked by URA3 (from pJR1423) was used 
to determine that SEC18 and 0RC2 were separated by 6.6 centimorgans (Table 1). 
No previously-mapped genes involved in silencing map near SEC18. 
Table 1. Linkage of 0RC2 to LYS2 and 0RC2 to SEC18 

Tetrad types Map 

distance 



Cross 


PD 


I 


NPD 


£cMl 


ORC2 vs LYS2 


10 


.14 


0 


29 


ORC2 vs LYS2 


20 


14 


0 


21 


ORC2 vs LYS2 TOTAL 


30 


28 


0 


24 


ORC2 vs SEC18 


46 


7 


0 


6.6 



The C?^C2 mutants arrested with a cell cycle terminal phenotype. 

The effect of the orc2-7 mutation on the cell division cycle was explored: 

25 mutant orc2-l strains were grown in liquid medium at 23°C, the permissive 
temperature, and then shifted to 37°C to test whether the cells arrested with a 
single terminal morphology. Specifically, orc2-l cells (JRY3683) were grown to 
log phase at the permissive temperature (23''C) and the culture was split. Half of 
the culture was grown an additional five hours at the permissive temperature and 

30 the other half was shifted to the nonpermissive temperature (37°C) and grown for 
an additional five hours. At that time, both cultures were fixed and stained with 
DAPI to allow visualization of the nucleus. In the culture maintained at the 
permissive temperature, cells at all phases of the cell cycle were observed. Cells 

20 
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later in the cell cycle, as evidenced by the presence of large buds, frequently 
exhibited nuclei in both the mother and the daughter cell. In contrast, in the 
culture shifted to the restrictive temperature, approximately 90% of the cells 
arrested as large budded cells. Nuclei were only present in the mother cell and not 
5 in the daughter cells. In addition, the cells were larger than those grown at the 
permissive temperature, indicating that protein synthesis and cell wall synthesis 
continued in the absence of 0RC2 function. Similar results were obtained with two 
additional orcl-l strains (JRY3685 and JRY3687). 

0RC2 cells harvested either after continuous growth at the permissive 

10 temperature or after a shift to the nonpermissive temperature were fixed and 
stained with DAPI allowing visualization of DNA with fluorescence microscopy. 
The cells grown permissively displayed a range of morphologies from small 
unbudded cells to cells with single buds of various sizes. The cells shifted to the 
nonpermissive temperature looked very different: the majority arrested as large 

15 budded cells, and for the most part, each mother-daughter pair contained only a 
single brightly-staining region, often at or near the neck. These data indicated that 
orc2'J mutants displayed cell cycle defects characteristic of mutants defective in 
DNA replication. 

20 Cloning of the ORC2 gene: 

The ORC2 gene was cloned by complementation of the orc2-l temperature 
sensitivity (22). One complementing clone (pJR1416) was chosen for further 
analysis. Subclones missing various fragments from the insert were retransformed 
into an orc2 strain to assay whether the deletion affected the clone's ability to 

25 complement orc2'r% temperature sensitivity. The key observations were that the 
deletion of a 2.8-kb Ssil-Sstl fragment destroyed complementation activity, whereas 
the deletions of flanking sequences {Xbal, and the larger55rf fragment) had no 
effect. The 2.8-kb fragment was subcloned (pJR1263), and shown to possess 
complementing activity. 

30 To determine whether the gene on the clone was indeed allelic to the ORC2 

mutation, a fragment of the original clone was subcloned into a yeast integrating 
vector. This plasmid (pJR1423) was cleaved within the insert to direct homologous 
integration and transformed into a wild-type strain (W303-1A). As a result, the 
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site of integration was marked by the plasmid's URA3 gene. The resulting strain 
(JRY4134) was crossed to an orc2-l strain (JRY3685). In each of 59 tetrads, 
URA3 segregated opposite to the temperature sensitivity caused by orc2-l, 
indicating that 0RC2 had indeed been cloned. 
5 0RC2 was essential for cell viability. 

. ORC2 was disrupted by URA3, (23), and integrated into a diploid 
homozygous for uraS and 0RC2, (JRY3444). Of the 41 tetrads dissected, 40 
tetrads had two live and two dead segregants, and one tetrad had only one live 
segregant. The colonies that grew were, without exception, Ura-. By inference, 

10 the dead segregants contained the URAS gene, and thus the 0RC2 disruption, 
indicating that 0RC2 function was essential for cell viability at all temperatures. 
The dead segregants were examined under a microscope to gain some insight into 
the true null phenotype. Most of the spores germinated into cells that were 
elongated or otherwise deformed and had not divided. In no case did the cell 

15 divide more than two times. Thus in many spores, the absence of 0RC2 blocked 
cell division but not growth. 

Role of 0RC2 in Plasmid Replication 

To test the role of 0RC2 in plasmid stability, an isogenic pair of strains, 
one wild type (W303-1B) and one orc2'] (JRY4125), were transformed with a 

20 plasmid containing a centromere, a suppressor tRNA (SUPIJ-l), URA3, and 

ARSl, a chromosomal origin of replication (YRP14/CEN4/ARS1/ARS1; (24, 25), 
selecting for uracil prototrophy. Transformants were grown on selective medium 
at 23'*C, the permissive temperature for orc2-l. The colonies were picked from 
the selective plate, serially diluted, plated onto solid rich medium and grown to 

25 colonies at 23**C. The wild-type transformants grew into colonies most of which 
were white with a few exhibiting red sectors. The small fraction of red colonies 
were from cells in the selectively grown colony that had lost the plasmid. In 
contrast, the majority of colonies from the orc2'l mutant were red, reflecting a 
high degree of plasmid loss among the cells in the selectively grown colony. 

30 Moreover, in the orc2-l strain, red sectors were present in the majority of white 
colonies with some white colonies displaying multiple red sectors. 

It is possible to quantitate the number of cell cycles in which a plasmid is 
lost from the number of colonies that are half red and half white. Only those 
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colonies that lose the plasmid in the first cell division form half red, half white 
colonies. In the case of the wild-type strain, 0.9 % (10 / 1168) of the colonies 
were half red and half white, indicating that the plasmid was lost in 0.9 % of cell 
cycles.. In contrast, the frequency of half red and half white colonies in the orc2-l 
5 strain grown at the permissive temperature was 11% (58 / 512), indicating that the 
same plasmid was lost approximately 12 times as often in the strain with partially 
defective Orc2p. These data indicated a profound defect in plasmid stability 
specific to the orc2- 7 strain, and in combination with the cell-cycle phenotype of 
orc2-l, suggested that orcl-l strains were defective in DNA replication. These 
10 results were consistent with the flow cytometry studies of orc2-l strains herein. 
Sequence of 0RC2 

The sequence of the 2.8-kb 55/1-5^/1 orc2-complementing fragment was 
determined and deposited in Genbank (Accession #L23924). The only open 
reading frame of significant length was deduced to be ORC2,, and predicted a 620 

15 residue protein of approximately 68 kD. The Sstl fragment included 806-bp of 
upstream sequence and 140-bp of downstream sequence. 

The deduced Orc2p protein was 15% basic residues and 16% 
serine/threonines. Fully 50% of the N-terminal residues (residues 15-280) were 
lysine, arginine, proline, serine, or threonine. The KeyBank motif program 

20 revealed several matches to peptide motifs within Orc2p. Orc2p contained many 
potential phosphorylation sites: 3 for cAMP- and cGMP-dependent protein kinase 
(starting at residues .57, 433 and 546), 12 for protein kinase C (24, 41, 42, 89, 
101, 102, 176, 321, 335, 431, 521, and 549) and 14 for caseine kinase II (60, 148, 
149, 182, 238, 270, 389, 481, 486, 491, 505, 552, 595, and 605), and match to 

25 the nuclear targeting sequence (residues 103-107). A perfect match to the RAPl 
binding site consensus (starting at nucleotide 595), and two near matches (12/15) to 
the ABFl-binding consensus sequence (starting at 12 and 609). It was determined 
by sequence homology that a lysyl tRNA synthetase gene is located to the left of 
the 55/1 fragment shown here (Mirande and Waller, 1988), and a kinase homolog 

30 to the right. 

Another homolgy is with the region near the catalytic domain of human 
topoisomerase I proteins which has diverged among topoisomerase I proteins from 
other species except for the region surrounding the invariant active-site tyrosine. 

23 
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This region includes a consensus sequence consisting of a serine and lysine residue 
near the tyrosine (25). The Orc2p protein also contained such a consensus 
sequence near its C-terminus. However, mutation of this putative active-site 
tyrosine to phenylalanine had no detectable effect on the ability of 0RC2 to 
5 complement the temperature-sensitivity or mating defect of an orc2-l strain. 
Table 2. Strain list. 
Strain Genotype *'* 

DBY1034 AM7a his4'539 lys2-801 ura3-52 SUC 
W303-1A MAT& ade2-l canl-IOO his3-ll,15 leu2-3.112 trpl-1 
10 ura3-l 

W303-1B MAToi ade2-l canl-100 his3-11.15 leu2-3,112 trpl-1 
ura3-l 

PSY152 MATahis3D200leu2-3.1}2 Iys2-80I ura3-52 
JR Y4 1 30 MA Ta his4 ura3 sec 18 
15 JRY438 MATa Gdl* his4-519 leu2-3,112 SUC2 ura3-52 

JRY543 MATdJMATa ade2-101/ade2-101 his3A200/his3A200 

Iys2-801/lys2-801 ma2/MET2 TYRl/tyrl 

ura3-52/ura3-52 
JRY2640 matTil ade2 leu2-3,112 lys2-801 ura3 
20 JRY2698 MATa HMRa ade2-101 his3 leu2 trpl ura3-52 
JRY2699 MATa HMRa ade2-101 his3 leu2 trpl ura3-52 

sir4DN::HlS3 

JRY2700 MATa HMRa ade2-101 his3 leu2 trpl ura3-52 
+ pJR924 

25 JRY2903 MATa HMRa ade2-101 his3 leu2 orc2-l trpl ura3-52 
JRY2904 MATa HMRa ade2-101 his3 leu2 orc2-l trpl ura3-52 
-f pJR924 

JRY3444 MATdJMATa ade2-101/ade2-101 his3D20O/his3D20O 
lys2-801/lys2-801met2/MET2 TYRl/tyrl 
30 ura3-52/ura3-52 orc2::TnlOLUK/ORC2 

JRY3683 matzl {HMRa} ade2 his3 leu2 orc2-lura3 
JRY3685 MATa HMRz-e-rapl-lO ade2 leu2 trpl orc2-l ura3 
JRY3687 MATa hmrDA::TRPl ade2 his3 leu2 trpl ura3 orc2-l 
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JRY3690 

JRY4125 

5 JRY4132 
, JRY4133 
JRY4134 

JRY4135 
10 JRY4136 
JRY4137 
JRY4138 
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MATa HMRa-e-rapl-]0 ade2 his3-ll,15 leu2 orc2-l 
trpj uraS 

MATa ade2-l canl-100 his3-11.15 leu2-3,112 orc2-l 
trpl-l ura3-l 

MATa HMRdi-e-rapl-lO ade2 his3 ura3 

MATa HMRa-e-rapJ-lO ade2 leu2 orc2-hrpl ura3 

MATa ade2-l canl-100 his3-ll,15 leu2-3,112 trpl-l 

ura3-l 0/?C2. ;pJR1423 

matzl ade2 leu2-3,112 lys2-801 ura3 stel4 

matzl ade2 leu2-3,112 lys2-801 orc2-2 ura3 stel4 

matz.1 ade2 leu2-3,112 lys2-801 orc2-3 ura3 stel4 

mar&l ade2 Ieu2-3,I12 lys2-801 orc2-4 ura3 stel4 



(a) Unless otherwise noted, all strains were HMLa and HMRa. HMRa-e- 
15 rapI-10 refers to the allele of f/M/?-£, originally described as iM^Z-i, that 
contains a mutation in the RAPl binding site (21). 

Numbered Citations for Example 2. 

1. I, Herskowitz, et al Cold Spring Harbor Laboratoiy Press 583 (1992).^ 
20 2. J. Abraham, J. Feldman, K.A. Nasmyth, J.N. Strathem, J.R. Broach, and 
J. Hicks, C.S.H. Symp. Quant, Biol. 47, 989 (1982). J.B. Feldman, J.B. Hicks, 
and J.R. Broach, J. Mol. Biol. 178, 815 (1984). 

3. J. Rine, and I. Herskowitz, Genetics 116, 9 (1987). 

4. Kurtz et al. Genes Dev. 5, 616 (1991); Sussel et al, PNAS 88, 7749 (1991). 
25 5. J.R. Mullen, et al, PMflO J. 8,2067 (1989). 

6. P.S. Kayne, et al. Cell 55, 27 (1988). L.M. Johnson, et al, Proc. Natl. 
Acad. Sci. USA 87, 6286 (1990). P.D. Megee, et al. Science 247, 841 (1990). E. 
Park, and J. Szostak, Mol. Cell. Biol. 10, 4932 (1990). 

7. P. I^urenson, and J. Rine, Microbiol. Rev. 56, 543 (1992). 

30 8. Brand, et al.. Cell 41, 41 (1985); Kimmerly, et al., EMBO J. 7, 2241 
(1988). 

9. D. Shore, and K. Nasmyth, Cell 51, 721 (1987). 
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10. M.S. Longtine, et al., Cum Genet. 16, 225 (1989). 

11. A.R. Buchman, et al, A/o/. Cell. BioL 8, 5086 (1988). 

12. J.F.X. Diffley, and J.H. Cocker, Science 357, 169 (1992). 

13. A.S. Buchman, and R.D. Komberg, MoL Cell Biol 10, 887 (1990). 

5 14. J. A. Huberman, et al, Nucleic Acids ResAb, 6373 (1988). B.J. Brewer, 
and W.L. Fangman, Cell 51, 463 (1987). 

15. S.P. Bell and B. Stillman, Nature 357, 128 (1992). 

16. F.J. McNally, and J. Rine, Mol Cell Biol 11, 5648 (1991). 

17. A.M. Miller, and K.A. Nasmyth, Nature 312, 247 (1984), 
10 18. D.H. Rivier, and J. Rine, J. Science 256, 659 (1992). 

19. Two genetic screens were devised to identify temperature sensitive 
mutations in essential genes involved in silencing. The screen that led to isolation 
of orc2-l started with JRY2698 {HMLa, MATa, HMRa, ade2, his3, leu2, trpl, 
ura3-52), which had a mating-type cassettes at all three chromosomal mating-type 

15 loci and was transformed with a plasmid (pJR924) containing the a mating-type 
cassette at HMR (JRY2700). The plasmid-bome HMRa locus had two synthetic 
silencers substituted for the E silencer, and also had a deletion of the / element. 
The use of two silencers rather than one minimized the risk of being distracted by 
site mutations in the silencer. One hundred and sixty two thousand colonies of 

20 EMS-mutagenized colonies were grown on supplemented minimal media (without 
uracil) at 25°C and screened for derepression of the plasmid-bome a cassette at 
HMR. Mutagenized. colonies were replica-plated onto lawns of the mating tester 
strain DBY1034 (Af^Ta, his4-539, lys2'80l ura3-52) on minimal media either 
with or without uracil supplementation. Replicas were incubated at 25°C for one 

25 hour, then overnight at 30^C. Only plasmid-containing JRY2700 cells were able to 
mate with the tester strain to yield diploids capable of growing on the 
unsupplemented plates because the only functional URA3 gene was on the plasmid. 

Cells bearing mutations causing derepression of the plasmid-bome a 
cassette could be distinguished from the other classes of mutations by exploiting a 

30 feature of yeast plasmids. Approximately 10% of the cells in these colonies lacked 
the plasmid and thus could, in principle, mate with the tester strain and form Ura* 
diploids capable of growth on the plates supplemented with uracil. If a colony had 
a mutation in the mating response pathway, the cells would be unable to mate even 

26 
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in the absence of the plasmid, and thus would be unable to form diploids capable 
of growth on medium supplemented with uracil. Twenty eight strains were 
identified that were temperature-sensitive for growth and that mated with the tester 
strain only on plates supplemented with uracil. Plasmid-free isolates of each strain 
5 were then retransformed with the plasmid bearing the synthetic silencer at the 
HMRz locus (pJR924) and with the plasmid bearing the wild-type HMRdi locus 
(pJR919; McNally and Rine, 1991). Three strains were able to mate when 
carrying the wild-type HMR locus (pJR919) but not when carrying the synthetic 
silencer-containing HMR locus (pJR924). In order to determine if the ts growth 

10 phenotype and the mating phenotype were due to the same mutation, spontaneous 
revertants of the ts phenotype were selected. A spontaneous revertant of the ts 
growth of one strain, JRY2904, mated as well as the wild-type JRY2700, 
suggesting that the mating phenotype and temperature-sensitive growth were due to 
the same mutation which was named orcl-L 

15 20. Y. Kassir, et al, Genet. 109, 481 (1985). Foss and Rine, Genetics, (1993) 
21. The ORC2 gene was cloned by complementation of the temperature 
sensitivity of orc2'L An orc2-l strain (JRY3683) was transformed with a CEN 
L£J72-based Saccharomyces cerevisiae genomic library (32) Approximately 1000 
to 1500 transformants formed colonies at 23°C. Replica prints of these colonies 

20 were incubated at 31^C to screen for the ability to grow at elevated temperatures. 
Plasmids were isolated from temperature-resistant strains and retested. Those 
plasmids that complemented the defect a second time were analyzed by restriction 
digestion. One plasmid from the CEN-LEU2 library (pJR1416) was chosen for 
further analysis. 

25 22. 0RC2 was disrupted with the T/i70 LUK transposon (33), which inserted 
within the 0RC2 coding sequence on the plasmid (pJR1146) carrying the Sstl oral- 
1 complementing fragment. Plasmid pJR1147 had the 7/i70LUK insertion within 
the ORC2 coding region. The 0/?C2-containing Sstl fragment, disrupted by the 
transposon, was removed from pJR1147 by partial digestion with Ssth The 

30 fragment was transformed into the wild-type diploid JRY543. The integration of 
this disruption allele at the 0RC2 locus was confirmed by DNA blot hybridization 
analysis (Southern, 1975), and the diploid was named JRY3444. 
23. P. Hieter, C. Mann, M. Snyder, and R.W. Davis, Cell 40, 381 (1985). 
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24. D. Koshland, J.C. Kent, and L.H. Hartwell, Cell 40, 393 (1985). R.M. 
Lynn, et al, Proc. NatL Acad. ScL USA 86, 3559 (1989). W.-K. Eng,S.D. 
Pandit, and R. Stemglanz, 7, BioL Chem. 264, 13373 (1989). 

25, 26. A.H. Brand, G, Micklem, and K. Nasmyth, Cell 51, 709 (1987). 
5 27. S. Shuman, et al, Proc, NatL Acad. 5c/. USA 86, 9793 (1989). 

28, 29. J. Singh, and A.J.S. Klar, A. J. S. Genes and Dev. 6, 186 (1992). 

30. D.D. Dubey, et al, MoL Cell. BioL 11, 5346 (1991). 

31. C.A. Hrycyna, et al, EMBO J. 10. 1699 (1991). 

32. A mutation was introduced into the RAPl binding site at HMR-E adjacent 
10 to the HMRa locus by oligonucleotide-directed mutagenesis (35), and the change 

confirmed by sequencing. The RAPl site mutation was identical to the PASl-1 
mutation of HMR-E characterized previously that blocks RAPl protein binding in 
vitro (21), and is described here as HMRa-e-rapl-10, The plasmid consisting of 
the HMRa-e-rapl'lO Hindlll fragment in pRS316 was named pJR1425. The wild- 

15 type HMRa version of the same plasmid was named pJR1426. Approximately 
100,000 mutagenized cells from 12 independent cultures of the HMLa mat^l 
HMRz stel4 strain with the HMRa plasmid (pJR1425) were grown into colonies at 
23°C and replica-plated to a MATz. ura3 mating-type tester lawn (PSY152) to 
identify mutants exhibiting the a mating phenotype. The mating plates were 

20 incubated at 30°C in order to identify mutants defective enough to be derepressed 
at HMR yet not so defective as to be inviable. Of nine hundred haploid mating 
proficient colonies that were picked, fifty mutants were temperature sensitive for 
growth at 37T to some degree. These mutants were subjected to further study and 
the remainder were discarded. All 50 mutants were recessive to wild-type. Only 

25 the subset of mutants relevant to ORC2 are presented here; the remainder will be 
discussed elsewhere. 

33. The ORC2 gene was defined by the orc2-] mutation. An orc2- 
complementing plasmid (pJR1416) was obtained by complementation of the 
temperature sensitivity of orc2-L In order to map the approximate position of the 

30 orc2 -complementing gene in the plasmid, six derivatives of pJR1416 were made 
and tested for complementation. The Sall-Sall fragment was removed from the 
insert to yield pJR1418. Three adjacent XbaVXbal fragments were removed to 
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yield pJR1422. Sph\ cleaved once in the insert and once just inside the vector. 
Deleting this Sphl-Sphl fragment produced pJR1417. Cleavage by Sstl released 
two fragments from the insert. Deletion of both fragments created pJR1419. 
Isolates in which only the larger Sstl fragment (pJR1421) or only the smaller Sstl 
5 fragment (pJR1420) was deleted were also recovered. The 2.8-kb Sstl-Sstl orcl- 
complementing fragment was cloned into the Sstl site of the CEN URA3 vector 
pRS316 (36), to yield pJR1263. Two plasmids were made which allowed the 
chromosomal integration of part or all of 0RC2. The first, pJR1423, contained an 
XhollKpnl insert (from pJR1416) which extended from a few kb upstream of the 
10 0RC2 start codon to about 60-bp upstream of the stop codon inserted into Xhol- 
Kpnl-cui pRS306 (36), a yeast integrating vector marked by URA3. The second 
plasmid, pJR1424, contained the Sstl orc2-complementing fragment inserted into 
the Sstl site of pRS306. 

34. F. Spencer, et al Genetics 124, 237 (1990). 
15 35. O. Huisman, et al, Genetics 116, 191 (1987). 

36. E.M. Southern, J. MoL BioL 98, 503 (1975). 

37. T.A. Kunkel, et al, Methods EnzymoL 154, 367 (1987). 

38. R.S. Sikorski, and P. Hieter, Genetics 122, 19 (1989). 

20 Example 3. 

In order to identify potential yeast initiators, we developed a genetic 
strategy, the one-hybrid system, to find proteins that recognize a target sequence of 
interest. The one-hybrid system has two basic components: (i) a hybrid expression 
library; constructed by fusing a transcriptional activation domain to random protein 

25 segments, and (ii) a reporter gene containing a binding site of interest in its 

promoter region. Hybrid proteins that recognize this site are expected to induce 
expression of the reporter gene, because of their dual ability to bind the promoter 
region and activate transcription (8). This association may be indirect, since 
hybrids that interact with endogenous proteins already occupying the binding site 

30 will also activate transcription (7). Nevertheless, as long as the association is 
sequence-specific the protein incorporated in the hybrid should be functionally 
relevant. 
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We have used this method to look for proteins from the yeast 
Saccharomyces cerevisiae that recognize the ARS consensus sequence (ACS) of 
yeast origins of DNA replication. The protein component of this screen was 
provided by a set of three complementary yeast hybrid expression libraries, YLl-3, 
5 containing random yeast protein segments fused to the GAL4 transcriptional 

activation domain (GAL4^) (9). The reporter gene for our screen contained four 
direct repeats of the ACS in its promoter region and was integrated into the yeast 
strain GGYl to form JLY363(ACS^ (10). To determine the dependence of lacZ 
induction on the ACS, we constructed in parallel JLY365(ACS"^'^'^), which 

10 harbors a reporter gene carrying four copies of a nonfunctional multiply-mutated 
ACS (Fig. 4) (10). 

We isolated nine plasmids that induced greater lacZ activity in 
JLY363(ACS^) than JLY365(ACS"^'^^^) from a screen of 1.2 million YLl-3 
transformants (11). Many of the plasmids that induced lacZ activity on initial 

15 screening of the library in JLY363(ACS'^'^) failed to exhibit a dependence on the 
ACS when introduced into JLY365(ACS^^^*^^). Restriction analysis of these 
plasmids showed that the nine isolates represented five genomic clones, which we 
initially labeled AAPJ-5 for ACS associated /protein. AAPJ was isolated four times, 
AAP5 twice, and the others only once. 

20 To examine the sequence specificity of lacZ induction with finer resolution, 

reporter constructs containing direct repeats of four ACS point mutants were each 
integrated into GGYl to generate the set of reporter strains(lO). The five AAP 
clones were individually examined in these strains for the ability to induce lacZ 
expression. AAPI displayed a correspondence between the induction of this set of 

25 reporter genes and the ARS function (12) of their ACS. The AAP5 hybrid 
exhibited a slightly weaker correlation, and the remaining clones showed poor 
correlation. These findings suggest that AAPI, and possibly AAPS^ encodes a 
protein that recognizes the ACS in a sequence-specific manner. Constructs with 
deletions in the AAPI coding sequence (14) were unable to induce lacZ expression, 

30 indicating that recognition of the ACS resided in the protein segment fused to 
GAL4. 

The genomic segments fused to the GAL4^° in AAPl-S were sequenced (15) 
to determine the extent of the hybrid proteins that were made. AAPI and AAP5 

30 
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had sizable protein coding sequences of 301 and 123 amino acids, respectively, 
fused in frame with the GAL4^''. In principle, these segments are large enough to 
direct the hybrid protein to the promoter of the reporter gene. AAP2'4 encoded 
hybrid proteins with only short peptide extensions (10, 22, and 38 amino acids 

5 respectively) fused to the GAL4^^, suggesting that these hybrids were not 

responsible for the transcriptional induction attributed to these clones. Because of 
this finding and the lack of proper sequence specificity for the ACS element, 
AAP2-4 were not studied further. 

The full-length gene for AAPl was cloned from a yeast genomic library 

10 and sequenced (15) (Genbank accession no. L23323). AAPl contains an open 
reading frame for a protein 435 amino acids long with a predicted molecular 
weight of 50,302 daltons. The hybrid GAL4^^-AAP1 protein obtained from the 
screen was a fusion of the GAL4^^ to the C-terminal two-thirds of the predicted 
full-length protein (residues 135-435) , indicating that this portion of the molecule 

15 is sufficient for association with the ACS. Comparison of peptide sequences from 
the 50kd subunit of ORC with the predicted protein sequence from AAPl 
demonstrated that our gene encodes this subunit and confirmed the association 
between the AAPl protein and the ACS. Because of this identity, we have 
renamed our gene 0RC6, 

20 An overlapping ORF capable of encoding a protein 250 amino acids long 

exists on the complementary strand. The positions of the predicted start and stop 
codons for this ORF are at nt 1615-7 and nt 865-7, respectively. In pJL766 the C 
residue at 1471 was mutated to a T, preserving the amino acid sequence of ORC6 
but introducing a stop codon in this overlapping ORF. The sequence of ORC6 

25 indicates a connection with the regulatory machinery governing cell cycle 

progression. Orc6p contains four phosphorylation sites, (S/T)PXK, for cyclin- 
dependent protein kinases (20) clustered in the first half of the molecule. Using 
the more relaxed consensus site (S/T)P adds two more sites to this cluster. We 
have observed Orc6p phosphorylated in vivo on serine and threonine residues. 

30 However, since the initiation of yeast DNA replication commences promptly in 
response to the activation of this protein kinase in Gl, we believe that Orc6p and 
possibly other ORC subunits are regulated substrates of this kinase. Finally, as 
expected for a protein participating in nuclear events, Orc6p contains a potential 
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nuclear localization signal (NLS) within the (S/T)PXK cluster and one in the C- 
terminal domain (amino acid residues 1 17-122 and 263-279). Orc6p can be seen 
in the nucleus by immunofluoresence. 

A marked deletion of the 0RC6 gene (pJL731) (21), removing all but 13 
5 codons from its open reading frame, was introduced into diploids from three 
different strain backgrounds. The resulting heterozygous ORC6 deletion strains, 
JLY481, JLY475, and JLY469 were induced to undergo meiosis, and 20 tetrads of 
each strain were dissected (21). In all backgrounds the 0RC6 disruption 
cosegregated with inviability, demonstrating that ORC6 is essential for cell growth. 

10 Microscopic examination revealed that mutant spores from JLY481 and JLY475 
germinated, completed 1-2 rounds of cell division, and then arrested with a 
uniform large bud morphology reminiscent of cell division cycle mutants defective 
in DNA replication or nuclear division (22). The position of cell cycle arrest could 
not be pinpointed, however, since the DNA content of these cells could not be 

15 readily measured. Mutant spores derived from JLY469 germinated poorly. 

The interpretation of these ORC6 deletion experiments was complicated by 
the presence of a second open reading frame (0RF2) of 250 amino acids on the 
antisense strand of the 0RC6 gent, ORF2 spans nucleotides 1617 to 868 of the 
Genbank sequence and overlaps the C-terminal two-thirds of the ORC6 coding 

20 sequence. A marked deletion that removed the N-terminal third of the ORC6 
coding sequence without affecting 0RF2 (pJL733) was introduced into diploids 
(21). Tetrad analysis again showed the 0RC6 deletion cosegregating with cell 
death. Finally, an 0RC6 gene was constructed that contains a silent codon change 
for the 0RC6 ORF but introduces a UGA stop codon in 0RF2 (22). This gene 

25 was able to rescue a haploid strain containing a full deletion of the 0RC6 ORF. 
We conclude that ORC6 is essential for cell viability. 

Our results validate the one-hybrid system screen as a method to identify 
and clone genes for proteins that recognize a DNA sequence of interest. This 
screen has also been successful in identifying DNA-binding proteins (23), and a 

30 variation of this screen has been used to identify a binding site for a suspected 
DNA-binding protein (24). The one-hybrid approach is particularly useful for 
proteins that are difficult to detect biochemically or for which starting material in a 
purification is difficult to obtain. 

32 
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We identified genes that interact genetically with 0RC6 using established 
cdc mutants because germinating spores bearing an 0RC6 deletion appeared to 
exhibit a cell division cycle phenotype. pJL749 (28), a plasmid that overexpresses 
Orc6p several hundred-fold, was introduced into a virtually isogenic set of 
5 temperature-sensitive cdc mutants arresting at various points in the cell cycle (29). 
Overexpression of 0RC6 selectively affected cdc6 and cdc46 mutants, lowering 
their restrictive temperature by 5-7** C; there was no significant effect on the other 
mutants examined or on the wild-type strain (Table 1). 



Strain 


cdc mutant 


viability with 
overexpression of 0RC6 


RDY488 


wild-type 


-f- 


RDY501 


cdc28-l 


-1- 


RDY510 


cdc4-l 


+ 


RDY664 


cdc34-2 


+ 


RDY543 


cdc7-4 


-1- 


JLY310 


cdc6-l 




JLY179 


cdc46- 1 




JLY338 


cdc2-l 


+ 


JLY353 


cdc 17-1 


-1- 


RDY619 


cdc 15-2 


-1- 



Table 1. Viability of cdc Mutants in the Presence of High Levels of ORC6 
Expression. JL749 (GALp-HA-0RC6), JL772 (GALp-HA), and RS425 were 
introduced into each cdc mutant, and examined for growth at various temperatures 
25 under conditions that induce expression of 0RC6 (28, 29). + indicates mutants 
whose restrictive temperature remains unchanged in the presence of JL749 relative 
to JL772 and RS425. - indicates mutants whose restrictive temperature is lowered 
5-7^ C when JL749 is present. 
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Numbered Citations for Example 3 

1. Kelly, J. Biol. Chem. 263, 17889 (1988); Marians, Annu. Rev. Biochem. 
61, 673 (1992); Komberg, Baker, DMA Replication. (Freeman and Company, 
New York, 1992); B. Stillman, Annu. Rev. Cell Biol. 5, 197 (1989). 
5 2. M. L. DePamphilis, Annu. Rev. Biochem. 62, 29 (1993). 

3. Campbell and Newlon, in The Molecular and Cellular Biology of the Yeast 
Saccharomyces Broach, et al, Eds. (CSHL Press, 1991), vol. 1, pp. 41-146. 

4. Fangman and Brewer, Annu. Rev. Cell Biol. 7, 375 (1991). 

5. J.R. Broach et al.. Cold Spring Harbor Symp. Quant. Biol. 47, 1165 
10 (1983); Van Houton and C. S. Newlon, Mol. Cell. Biol. 10, 3917 (1990). 

6. Y. Marahrens and B. Stillman, Science 255, 817 (1992). 

7. S. Fields and O.-K. Song, Nature 340, 245 (1989); C.-T. Chien, P.T. 
Bartel, R. Stemglanz, S. Fields, Proc. Natl. Acad. Sci. USA 88, 9578 (1991). 

8. R. Brent and M. Ptashne, Cell 43, 729 (1985). 

15 9. The N-terminal portions of the hybrids from hree related hybrid expression 
libraries, YLl-3 (7), consist of the SV40 nuclear localization signal and amino 
acids 768-881 of the GAM activation domain (GAL4*°). The C-terminal ponions 
were derived from random yeast protein segments which have been fused to the 
end of the GAL^*". These segments are encoded by short (l-3kb) fragments from 

20 a Sau3a partial digest of yeast genomic DNA. Together, YLl-3 ensure that all 
three reading frames of these fragments can be expressed. 
10. pLRlDl is described in R.W. West Jr., R.R. Rogers, M. Ptashne, Mol. 
Cell. Biol. 4, 2467 (1984). We generated pBgl-lacZ from pLRlDl by (i) 
substituting an Xhol-Bglll-Xhol polylinker for the Xhol linker and (ii) precisely 

25 excising a Hind III fragment containing 2m sequences. The resulting vector has a 
unique Bgl II site approximately 100 bp upstream of the TATA box for insertion of 
DNA sequences in the promoter region and a unique Stul site for targeted 
integration of the plasmid at the URA3 locus. Multiple direct repeats of ARSl 
domain A and several of its mutant derivatives were inserted into the Bgl II site of 

30 pBgl-lacZ to generate all the reporter genes used in this work. The inserted repeat 
elements, derived from complementary oligonucleotides, were oriented with the 
TATA box to their right. Each reporter gene construct was integrated into the 
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URA3 locus of GGYl (MATa Dgal4 DgalSO ura3 leu2 his3 ade2 ryr) [G. Gill and 
M. Ptashne, Cell 51, 121 (1987)] to create a reporter strain. Integration of pBgl- 
lacZ into GGYl generated JLY387. 

11. YEPD (rich complete) and SD (synthetic dropout) media are as described 
5 [J.B. Hicks and I. Herskowitz, Genetics 83, 245 (1976)]. Standard methods were 

used for manipulation of yeast cells [C. Guthrie and G.R. Fink, Ed., Guide to 
Yeast Genetics and Moleculat Biology (Academic Press, San Diego 1991)] and 
DNA [F.M. Ausubel et al., Ed., Current Protocols in Molecular Biology (Wiley, 
New York 1989)]. Libraries YLl-3 were transformed [R.H. Schiestl and R.D. 

10 Geitz, Current Genetics 16, 339 (1989)] into JLY363 (10) and plated on SD-Leu at 
a density of 2-5000 colonies/ 10cm plate. 500,000 transformants were obtained for 
YLl and YL2, and 200,000 for YL3. Transformants were assayed on filters for 
production of b-galactosidase [L. Breeden and K. Nasmyth, Cold Spring Harbor 
Symp, Quant. BioL 47, 643 (1985)]. 49 isolates remained positive after colony 

15 purification (15 from YL-1; 22 from YL-2, 12 form YL-3), and library plasmids 
were extracted from them . These plasmids were each transformed into both 
JLY363 and its mutant counterpart JLY365 (10). Nine plasmids induced greater 
b-galactosidase activity in the wild type reporter strain than the control. These 
plasmids were classified into five clones, AAPl-S, based on their Hind III 

20 restriction pattern. Each clone was then retested in JLY360, JLY361, JLY387, 
JLY429, JLY431, JLY433, JLY435. The AAPl hybrid clone was called pJL720. 
The AAPl gene was later renamed ORC6.2 

12. The ARS function of the mutant sequences was analyzed in the context of 
ARSl domain B (Bglll-Hinfl fragment, nt 853-734) in the following CEN-based 

25 URA3-containing plasmids: pJL347 (wt), pJL243 (multiple), pJL326 (A863T), 
pJL338 (T869A), pJL330 (T862C), and pJL316 (T867G). These plasmids were 
transformed into JLY106 (MATa ura3 leu2 hisS trpl lys2 ade2) and its 
homozygous diploid counterpart JLY162. pJL243, pJL326, and pJL338 did not 
yield a high frequency of transformation and could not be assayed quantitatively 

30 for ARS function. pJL347, pJL330, and pJL316 transformed cells with high 

efficiency and were assayed for mitotic stability [Stinchcomb, et al. Nature 282, 39 
(1979)]. 
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13. pJL720, the ORC6 hybrid construct originally isolated from the YL3 
library, has two BamHI sites. The 5' site created by the hybrid junction 
corresponds to Sau3a site at nt. 843. Excision of the segment between the two 
sites generated pJL721, leaving amino acids 339-435 in frame with the GAL4^. 

5 pGAD3R (11) the parent vector for the YL3 library, contains no ORC6 sequence. 
pRS425, Christiansen, et al., Gene 110, 1 19 (1992), contains no components of the 
fusion protein. 

14. All sequencing was performed with Sequenase (USB) on collapsed double- 
stranded templates. The protein coding segments of the AAPl-5 hybrid clones were 

10 sequenced from their junction with the GAU*" to their stop codon. Two of the 
ORC6 sequencing primers were used as colony hybridization probes to screen a 
high copy number yeast genomic library [M. Carison and D. Botstein, Cell 28, 
145 (1982)] for a clone of the full-length 0RC6 gene (pJL724). The full-length 
gene was sequenced on both strands using oligonuclotide primers positioned 

15 approximately 200 nt apart. 

15. S. P. Bell and B. Stillman, Nature 357, 128 (1992). 

16. Hodgman, Nature 333, 22 (1988);Walker et al., EMBO J. 1, 945 (1982). 

17. P. Linder, et al., Nature 337, 121 (1989). 

18. E. A. Nigg, Seminars in Cell Biology 2, 261 (1991). 

20 19. 0RC6 deletions were constructed by replacing nucleotides 458-1721 

(pJL731) or nucleotides 458-846 (pJL733) of the Genbank sequence with the URA3 
Hindlll fragment oriented in the opposite direction to that of the ORC6 sequence. 
Each construct was used to generate heterozygous deletions of ORC6 in diploid 
strains by one-step gene replacement. 0RC6 deletion analysis was performed in 

25 JLY461 (MATa/MATa ura3/ura3 Ieu2/leu2 his3/his3 trpl/trpl ade2/ade2 [cit^J), 
JLY462 (MATa/MATa ura3/ura3 Ieu2/leu2 trpl/trpl his4/his4 canl/canl), and 
JLY463 {MATa/MATa ura3/ura3 Ieu2/leu2 trpl/trpl his3/HIS3); their respective 
genetic backgrounds are S288c, EG123, and A364a. Disruption of JLY461, 
JLY462, and JLY463 by pJL731 (full deletion) created JLY481, JLY475, and 

30 JLY469, respectively. Disruption of JLY461, JLY462, and JLY463 by pJL733 
(N-terminal deletion) created JLY485, JLY479, JLY473, respectively. These 
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heterozygous marked deletion strains were sporulated, and twenty tetrads of each 
were dissected and grown on YEPD to assess viability. 

20. Pringle and Hartwell, in The Molecular Biology of the Yeast Saccharomyces 
Strathem, et al, Eds. (CSHL Press, CSH, 1981), vol. 1, pp. 97-142. 
5 21. A point mutant (pJL766) was made by replacing the BamHI-SphI fragment 
of the full-length clone with a BamHI/SphI fragment generated by PCR from 
pJL720 using primers. One mutation changes nucleotide 1471 of the Genbank 
sequence from C to T and was confirmed by sequence analysis. 
22, M. M. Wang and R. R. Reed, Nature 364, 121 (1993). 
10 23. T. E. Wilson, et alt, Science 252, 1296 (1991). 

24. J. F. X. Diffley and J. H. Cocker, Nature 357, 169 (1992). 

25. pJL749 contains the GALl promoter (nt 146-816) driving the expression of 
0RC6 (nt 443-2298) in the high-copy yeast 'shuttle vector RS425 [T. W. 
Christiansen, et al., Gene 110, 119 (1992)]. 

15 26. The cdc mutant strains have been backcrossed 4-5 times against two 
congenic strains derived from A364a , RDY487 {MATa leu2 ura3 trpl) and 
"KDYA^ZiMATaleul ura3 trpl). All diVC uraS leu2 trpl , RDY510, RDY664, 
JLY310, and JLY179 are MATa; the rest are MATa, Additional markers can be 
found in JLY3lO{ade2), RDY543(his3), and RDY619 (pep4D::TRPl his3 ade2), 

20 pJL749, pJL772, and RS425 (28) were transformed into these strains and plated on 
SD-LEU at 22** C. Four colony-purified isolates from each transformation were 
patched onto SD-LEU plates and replica-plated to SGAL-LEU plates, all at 22** C. 
The patches on SGAL-LEU were replica-plated to a series of pre-warmed SGAL- 
LEU plates at 22°, 25^ 27°, 30°, 32.5°, 35°, 37°, and 38° C. The viability of cdc 

25 mutants containing pJL749 was compared to those containing pJL772 and pRS425. 

27. Hartwell, JMB 104, 803 (1976); Hennessy, et al G&D 4, 2252(1990). 

28. Chen, et al., PNAS 89, 10459 (1992); Hogan, et al, ibid. 89, 3098. 

29. B.J. Andrews and S.W. Mason, Science. 261, 1543 (1993). 

30 Example 4. Ore protein purification and gene cloning 

Protein Purification: To obtain sufficient protein for peptide 

sequencing, a revised purification procedure for ORC was devised, based on the 
procedure reported previously (Bell and Stillman, 1992). Whole cell extract was 
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prepared from 400g of frozen BJ926 cells (frozen immediately after harvesting a 
300 liter logarithmically growing culture, total of 1.6 kg per 300 liters). All 
buffers contained 0.5 mM PMSF, 1 mM benzamidine, 2 mM pepstatin A, 0.1 
mg/ml bacitracin and 2mM DTT. 400 mis of 2X buffer H/0. 1''*'^ (100 mM 
5 Hepes-KOH, pH 7.5, 0.2 M KCl, 2 mM EDTA, 2 mM EGTA, 10 mM Mg 

Acetate, and 20% glycerol) was added to the cells and after thawing the cells were 
broken using a bead beater (Biospec Products) until greater than 90% cell breakage 
was achieved (twenty 30 second pulses separated by 90 second pauses). After 
breakage is complete, the volume of the broken cells was measured and one twelfth 

10 volume of a saturated (at 4°C) solution of ammonium sulfate was added and stirred 
for 30 minutes. This solution was then spun at 13,000 x g for 20 minutes. The 
resulting supernatant was transferred to 45Ti bottle assemblies (Beckman) and spun 
in a 45Ti rotor at 44,000 RPM for 1.5 hrs. The volume of the resulting 
supernatant was measured and 0.27g/ml of ammonium sulfate was added. After 

15 stirring for 30 minutes, the precipitate was collected by spinning in the 45 Ti rotor 
at 40,000 RPM or 30 minutes. The resulting pellet was resuspended using a B- 
pestle dounce in buffer H/0.0 (50 mM Hepes-KOH, pH 7.5, 1 mM EDTA, 1 mM 
EGTA, 5 mM Mg Acetate, 0.02% NP-40, 10% glycerol) and dialyzed versus 
H/0.15M KCl (Buffer H with 0.15 M KCl added). This preparation typically 
• 20 yielded 12-16 g soluble protein (determined by Bradford assay with a bovine serum 
albumin standard). Preparation of ORC from this extract was essentially as 
described (Bell and Stillman, 1992) with the following changes (column sizes used 
for preparation of ORC from 400g of cells are indicated in parenthesis). The S- 
Sepharose column was loaded at 20 mg protein per ml of resin (-300 ml). The 

25 Q-Sepharose (50 ml) and sequence specific affinity column (5ml) was run as 
described but the dsDNA cellulose column was omitted from the preparation. 
Only a single glycerol gradient was performed in an SW-41 rotor spun at 41,000 
RPM for 20 hrs. We estimate a yield of 130 ptg of ORC complex (all subunits 
combined) per 400 g of yeast cells. 

30 Protein Sequencing: Digestion of ORC subunits was performed using an 

"in gel" protocol described by Kawasaki and Suzuki with some modification. 
Briefly, purified ORC (- 10 ^g per subunit) was first separated by 10% SDS- 
PAGE and stained with 0.1% Coomassie Brilliant Blue G (Aldrich) for 15 min. 
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After destaining (10% methanol, 10% acetic acid), the gel was soaked in water 
for one hour, then the protein bands were excised, transferred to a microcentrifuge 
tube and cut into 3-5 pieces to fit snugly into the bottom of the tube. A minimum 
volume of O.IM Tris-HCl (pH=9.0) containing 0.1% SDS was added to 
5 completely cover the gel pieces. Then 200 ng of Achromobacter protease I 
(Lysylendopeptidase: Wako) was added and incubated at 30 °C for 24 hrs. After 
digestion the samples were centrifuged and the supernatant was passed through an 
Ultrafree-MC filter (Millipore, 0.22Mm). The gel slices were then washed twice 
in 0.1% TFA for one hour and the washes were recovered and filtered as above. 

10 All filtrates were combined and reduced to a volume suitable for injection on the 
HPLC using a speed-vac. The digests were separated by reverse-phase HPLC 
(Hewlett-Packard 1090 system) using a Vydac C18 column (2.1x 250 mm, 5^01, 
300 angstroms) with an ion exchange pre-column (Brownlee GAX-013, 3.2x 
15mm). The peptides were eluted from the C-18 column by increasing acetonitrile 

15 concentration and monitored by their absorbance at 214, 280, 295, and 550 nm. 
Amino acid sequencing of the purified peptides was performed on an automated 
sequencer (Applied Biosystems model 470) with on-line HPLC (Applied 
Biosystems model 1020A) analysis of PTH-amino acids. 
ORC SUBUNIT CLONING: 

20 ORCl\ To clone the gene for the largest (120 kd) subunit of ORC, the 

following degenerate oligonucleoide primers 1201 and 1202 were synthesized based 
on the sequence of the first ORCl peptide. These oligos were used to perform 
PCR reactions using total yeast genomic DNA from the strain W303 a as target. 
A 48 base pair fragment was specifically amplified. This fragment was subcloned 

25 and sequenced. The resulting sequence encoded the predicted peptide indicating 
that it was the correct amplification product. A radioactively labeled form of the 
PCR product was then used to probe a genomic library of yeast DNA sequences 
resulting in the identification of two overlapping clones. Sequencing of these 
clones resulted in the identification of a large open reading frame that encoded a 

30 protein with a predicted molecular weight of 120 kd and that encoded all four of 
the ORCl peptide sequences. 

ORC3: To clone the gene for the 62 kd subunit of ORC, the following 
degenerate oligonucleoide primers 621 and 624 were synthesized based on the 
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sequence of the third peptide. These oligos were used to perform PCR reactions 
using total yeast genomic DNA from the strain W303 a as target. A 53 base pair 
fragment was specifically amplified. This fragment was subcloned and sequenced. 
The resulting sequence encoded the predicted peptide indicating that it was the 
5 correct amplification product. A radioactively labeled form of the PCR product 
was then used to probe a genomic library of yeast DNA sequences resulting in the 
identification of two overlapping clones. Sequencing of these clones resulted in 
the identification of a large open reading frame that encoded a protein with a 
predicted molecular weight of 71 kd and encoded all three of the 0RC3 peptide 
10 sequences. The inconsistency of the molecular weight is presumably due to 
anomalous migration of this protein during SDS-PAGE. 

0RC4: By comparing the sequnce of the 0RC4 peptides to that of the 
known potentially protein encoding sequnces in the genbank database we found that 
a portion of the 0RC4 coding sequence had been previously cloned in the process 

15 of cloning the adjacent gene. Using the information from the database we were 
able to design a perfect match oligo and use this to immediately screen a yeast 
library. Using this oligo as a probe of the same yeast genomic DNA library a 
lambda clone was isolated that contained the entire ORC4 gene. This gene 
encoded a protein of predicted molecular weight 56 kd and also all of the peptides 

20 derived from the peptide sequencing of the 56 kd subunit. 

0RC5\ To clone the gene for the 53 kd subunit of ORC, the following 
degenerate oligonucleoide primers 535 and 536 were synthesized based on the 
sequence of the first 0RC5 peptide. These oligos were used to perform PCR 
reactions using total yeast genomic DNA from the strain W303 a as target. A 47 

25 base pair fragment was specifically amplified. This fragment was subcloned and 
sequenced. The resulting sequence encoded the predicted peptide indicating that it 
was the correct amplification product. A radioactively labeled form of the PCR 
product was then used to probe a genomic library of yeast DNA sequences 
resulting in the identification of a single lambda clone. Sequencing of this clones 

30 resulted in the identification of a large open reading frame that encoded a several 
of the peptide sequences derived from the 53 kd subunit of ORC indicating that 
this was the correct gene. However the sequence of the 5' end of the gene wasno 
present in this lambda clone. Fortuitoulsy, the mutations in the same gene had also 
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been picked up in the same sreen that resulted in the identification of the 0RC2 
gene. A complementing clone to this mutation was found to overlap with the 
lambda clone and contain the entire 5' end of the gene. Sequencing of this 
complementing DNA fragment resulted in the identification of the entire sequence 
5 of the 0RC5 gene. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application 
were specifically and individually indicated to be incorporated by reference. 
Although the foregoing invention has been described in some detail by way of 
10 illustration and example for purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of the teachings of this 
invention that certain changes and modifications may be made thereto without 
departing from the spirit or scope of the appended claims. 
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35 
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(B) TELEFAX: (415) 398-3249 

(C) TELEX: 910 277299 

5 (2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4940 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ATAACATGCT CGCCCTTTTA TATTATGACA GAAAGAATAT ATATATTCAT ATATAAGATG 60 
CTTCTATTTA TTAGTTTTAT CTTTTAATTG ATGATGTGTC CATAGAATTT AAGTAAGTGC 120 
ATGGTATGGA GTGTATAATG GTTTATAATT TCCCCTAAGA TGACACAAAA AAATGTTCTC 180 
CCAAAAATTT ACCAAGAAAA AAAATTAAGA ATACTACACA ATTGATGCTT GGGTTATTTT 240 

25 AAATATCCGG TACATTCTAT TACAAATATG TTTGTACAAT GTAAGCCCCT TCATAATGGT 300 
CAGTATTAAG ATAAGGACTG CTATGGGGCA TTTTTTGTCT TACTGGGTAT CACAGGATAA 360 
TAACTTGGCG CCAAATTAGA AAAGATATAA ACCTCAAATA TTTGAAATTC TTTGGTGACC 420 
TGTCTCATCG TTATATCAAC AAATATTGCA CCAACGAACA CCACTACATA TGTAACTACT 480 
CTCTTCCTCG ACTTATTTTT TATTAACGTT GACACGGCCA GATCGAAAAT CATAGAAAAA 540 

35 CAACAACATT GAGAAGAGAT GAAGTTGCGC AAAGGGAAAG AAAACTGCAT AGGCGGCAAA 600 
TTCAGCCTAA AAGTTTCCAG AAGCAGGAAC TCATTCCCTA TTGATTAATA CTCATTACAA 660 
AAACCACAAT AGAGTAGATA AGATGGCAAA AACGTTGAAG GATTTACAGG GTTGGGAGAT 720 
AATAACAACT GATGAGCAGG GAAATATAAT CGATGGAGGT CAGAAGAGAT TACGCCGAAG 780 
AGGTGCAAAA ACTGAACATT ACTTAAAGAG AAGTTCTGAT GGAATTAAAC TAGGTCGTGG 840 

45 TGATAGTGTA GTCATGCACA ACGAAGCCGC TGGGACTTAC TCCGTTTATA TGATCCAGGA 900 
GTTGAGACTT AATACATTAA ATAATGTTGT CGAACTCTGG GCTCTCACCT ATTTACGATG 960 
GTTTGAAGTC AATCCTTTAG CTCATTATAG GCAGTTTAAT CCTGACGCTA ACATTTTGAA 1020 
TCGTCCTTTA AATTATTACA ATAAACTGTT TTCTGAAACT GCAAATAAAA ATGAACTGTA1080 
TCTCACTGCA GAATTAGCCG AATTGCAGCT ATTTAACTTT ATCAGGGTTG CCAACGTAAT 1140 
55 GGATGGAAGC AAATGGGAAG TATTGAAAGG AAATGTCGAT CCAGAAAGAG ACTTTACAGT 1200 
TCGTTATATT TGTGAGCCGA CTGGGGAGAA ATTTGTGGAC ATTAATATTG AGGATGTCAA 1260 
AGCTTACATA AAGAAAGTGG AGCCAAGGGA AGCCCAGGAA TATTTGAAAG ATTTAACACT- 1320 

60 

TCCATCAAAG AAGAAAGAGA TCAAAAGAGG TCCTCAAAAG AAAGATAAGG CTACTCAAAC 1380 
GGCACAAATT TCAGACGCAG AAACAAGAGC TACAGATATA ACGGATAATG AGGACGGTAA 1440 
65 TGAAGATGAA TCATCTGATT ATGAAAGTCC GTCAGATATC GACGTTAGCG AGGATATGGA 1500 
CAGCGGTGAA ATATCCGCAG ATGAGCTTGA GGAAGAAGAA GACGAAGAAG AAGACGAAGA 1560 
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CGAAGAAGAG AAAGAAGCTA 
ACTAGGTAAA GATGATATTG 
5 ACCTAAAGAT CCTAGTAAAC 
TACTCCTGTG ATTAGGAAAT 
CCCGTTTTCG AAAAGATTTA 

10 

ATTTTACGGA AATTCTTCGG 
CCAAAAGCAT CAGATTGTAG 
15 GTATGTCAAA GAAGAAATAT 
GAATGAATTC GCCTCAATTT 
TACTATATAC GTGGCTGGTA 

20 

AAAGGAACTA CTATCGTCTT 
AAATGGATTG AAAATGGTAA 
25 AGGAGAAAGG TTAACATGGG 
TCCAAAAAAT AAGAAGAAAA 
GAAATCTCAA GATATTATGT 

30 

TATTGTCATT GCAGTAGCCA 
TACTTCAAGA ATTGGGTTTA 
35 AAATATCATT GATTTAAGAC 
AACTGGCAAT GCTATTTTGA 
GCCTGAAGAC GTGAGGAAAG 

40 

GAGAAAAGTA GCAAGTGTTA 
AGCTGAAATT GCTGAAAAAC 
45 GGTTATTGAA GATGAAAATG 
AAGTAACAAA GCCAAAGACG 
TCACATCACG CACGTTATGA 

50 

TATGACGCGA CTTTCATTTA 
AAAGAACGGA TCTCAAGAGC 
55 TGAAGTAAAT GGCAGTAATA 
AAGTGATAAT ATTTCTGAAC 
ACTTGACGCG GGAATATTGT 

60 

GCTAAATATA TCAGTAGAAG 
TTTATAGATT CGGTTTTTAT 
65 AGCGCATTTA TCCAAAACAT 
GCTATTGTGT AGCTTGATTT 



GGCATACAAA TTCACCAAGG 
ACGCTTCTGT ACAACCTCCC 
CGCGTCAGAT GCTATTGATA 
TTACAAAAAA GAATGTTGCT 
AATCTATAGC TGCAATACCA 
AATTGATGGC ATCAAGGTTT 
AAACAATTTT TTCTAAAGTC 
TGAAGTCTGC AAATTTCCAA 
ATTTAAGTGC ATATAGTGCC 
CGCCTGGTGT AGGGAAAACT 
CTGCACAACG AGAAATACCA 
AACCCACAGA CTGTTACGAA 
CAGCTTCAAT GGAGTCACTA 
CCATTGTAGT CTTGTTGGAC 
ACAATTTTTT CAATTGGACT 
ATACAATGGA CTTACCAGAA 
CCAGAATTAT GTTCACTGGG 
TGAAGGGGTT GAACGACTCA 
TTGATGCGGC TGGAAACGAC 
TTCGCTTAAG AATGAGTGCT 
GTGGTGATGC AAGAAGAGCA 
ACTATATGGC TAAGCATGGT 
AGGAGCAAAT ATACGATGAT 
ATAATGATGA CGATGATGAC 
AAGCCTTAAA CGAAACTTTA 
CAGCAAAACT GTTTATTTAT 
AAGAACTGGG CGATATTGTC 
AGTTTGTCAT GGAGATAGCC 
AATTGAGAAT TATATCATGG 
TTAAACAAAC TATGAAGAAC 
AAGCCAAAAG AGCCATGAAT 
TATTCATGAC CTAGCATACA 
ACGATATTGT GGATGTACAT 
AAAATATGCT AACGCCAACT 
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AAAAGAGGCC GTAAGATAAA 1620 
CCCAAAAAAA GAGGTCGTAA 1680 
TCTTCATGCC GTGCAAATAA 1740 
AGGGCGAAAA AGAAATATAC 1800 
GATTTAACTT CATTACCTGA 1860 
GAAAACAAAT TAAAAACAAC 1920 
AAAAAACAGT TGAACTCTTC 1980 
GATTATTTAC CGGCTAGGGA 2040 
ATTGAGTCCG ACTCCGCTAC 2100 
TTAACCGTAA GGGAAGTCGT 2160 
GACTTTCTTT ATGTGGAAAT 2220 
ACTTTATGGA ACAAAGTGTC 2280 
GAGTTTTACT TTAAAAGAGT 2340 
GAACTCGATG CCATGGTAAC 2400 
ACTTACGAAA ATGCCAAACT 2460 
CGTCAGCTAG GCAATAAGAT 2520 
TATACGCACG AAGAGCTAAA 2580 
TTTTTCTATG TTGATACAAA 2640 
ACTACAGTTA AGCAAACGTT 2700. 
GATGCCATTG AAATAGCTTC 2760 
TTGAAGGTTT GTAAAAGAGC 2820 
TATGGATATG ATGGAAAGAC 2880 
GAAGACAAGG ATCTTATTGA 2940 
AATGATGGGG TACAAACAGT 3000 
AATTCTCATG TAATTACGTT 3060 
GCATTATTAA ACTTGATGAA 3120 
GATGAAATCA AGTTACTTAT 3180 
AAAACATTGT TCCAACAGGG 3240 
GATTTCGTTC TCAATCAGTT 3300 
GATAGAATAT GTTGTGTCAA 3360 
GAGGATGAGA CATTGAGAAA 3420 
CATACATATA CCTACATAGT 3480 
ACCTTCTATA TCTCCTTAAA 3540 
CTCACATGGT AGCAGGCGGG 3600 
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TATAGTTGTT TTCATGTATT AACGCCCGGC GATGGTGCCT TAGATGAGGG CGACGAGGAG 3660 
GGCTTCCTGA TATTATGGCT CTTTCTATCC TGACTTTTGT TATGATGTCG ATGTTGCTGG 3720 
5 CCACCTAGGT GCTTATATAT CAAAAGAGGA TCGCCGATTT CATTGATTTC TGGGATGGTT 3780 
AATGTCAAAT TAAAGATCTT TGCCAGTGCA ATTTTGAAAA TTTTTTGAAT GTTTATAGAT 3840 
TTGGCAGTAG AGCAGAATAT AAGAGGAGCA TTCATGACCT GTGCATACTT CATACTCGTT 3900 
CTCGAGATTT GTTCCTGATA TTCCGGGTCT AAGTCTATTA GTAAATCGTA CTTTGTGCCC 3960 
ACCAAAATAG GAATTGCCGA ATCATTTAGC CCGTACGCCT GCCTATACCA CTCCTTTATT 4020 
15 GAACTCAACG TCTCTGGACG TGTCAGGTCA AACAGAAATA TGATCACTGA AGACCCTACC 4080 
GTCGCAATTG GGAGCATGTT GATGAATTCT CTTTGTCCGC CTAAATCCAT TATAGAAAAT 4140 
ATAATATCCG TGGAGCGTAT GCTTACTTTT CTTTTCAAAA AGTTCACTCC CAGCGTCTGT 4200 
GTGTATTCCT TATCGTATAT GTTCTGTACG TACTTCACCA TCAGCGATGT TTTCCCTACT 4260 
TGTGCATCCC CTACTAATCC AACCTGAACT TCAACCTGAT TTCGTACCGC AGGTATAGAA 4320 

25 TTGTTTGCTC CCGTGCTTGG TGTAGCCATC TTAGCTTAAC TCAATTTAAT TTCTACAGCA 4380 
AAATCCAAAC GTAATATCTA TATTTTTCTC GAAAAACTGA GGACAAGAGC CAATCAATCA 4440 
TCTATAATCC AATTTATATT ATTTTTTCCC TTCTGGGTTC TTTTCTTCCT TTTCTTGTTT 4500 
ACCTTTTTTG CTTTTTCATA AAATAATTTC TCTAGATTTG AAGACAGCAT TTTTGTACAT 4560 
CCATACACCA TACACCATAC ACCATAGCAC CAGTACACTA TATTTTTATG AATTTTACTA 4620 

35 AGAATTATTC CTGCAGGAGC TCCACTGAAA AAAAAAGAGC AGCATGGATG TCATGTCGGT 4680 
AGAGTGCTAC TGAGTAAATG GGAGGACGCG GTAGATCCAG TGTGGAATCA AGGTGGTGCC 4740 
GGTGTGAAGC CGCCTCGGCC GGCTGGACTC TCCAGGCCGG AGTGATGATT GCCACGCTGA 4800 

40 

AGCTAACACA GTTTCACAAT ACCAGTGTCC TCATTAGTGA GTTCCAATGT ATAGTTAGTA 4860 
GTGGTATTTT GATATATGTG AGTGGTAGCA GATTTGAACT TAGTTAGTTG TATTCGCCTT 4920 
45 TGAGGAAACC AAGCCAAAAA 4940 

(2) INFORMATION FOR SEQ ID NO: 2: 

50 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 914 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



55 



60 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met Ala Lys Thr Leu Lys Asp Leu Gin Gly Trp Glu lie lie Thr Thr 
15 10 15 

Asp Giu Gin Gly Asn He He Asp Gly Gly Gin Lys Arg Leu Arq Arq 
05 20 25 30 

Arg Gly Ala Lys Thr Glu His Tyr Leu Lys Arg Ser Ser Asp Gly He 
35 40 45 
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Lys Leu Gly Arg Gly Asp Ser Val Val Met His Asn Glu Ala Ala Gly 
50 55 60 

Thr Tyr Ser Val Tyr Met lie Gin Glu Leu Arg Leu Asn Thr Leu Asn 
5 65 70 75 80 

Asn Val Val Glu Leu Trp Ala Leu Thr Tyr Leu Arg Trp Phe Glu Val 
85 90 95 

10 Asn Pro Leu Ala His Tyr Arg Gin Phe Asn Pro Asp Ala Asn lie Leu 

100 • 105 110 



15 



30 



45 



60 



Asn Arg Pro Leu Asn Tyr Tyr Asn Lys Leu Phe Ser Glu Thr Ala Asn 
115 120 125 

Lys Asn Glu Leu Tyr Leu Thr Ala Glu Leu Ala Glu Leu Gin Leu Phe 
130 135 140 



Asn Phe lie Arg Val Ala Asn Val Met Asp Gly Ser Lys Trp Glu Val 

20 145 150 155 160 

Leu Lys Gly Asn Val Asp Pro Glu Arg Asp Phe Thr Val Arg Tyr lie 

165 170 175 

25 Cys Glu Pro Thr Gly Glu Lys Phe Val Asp He Asn He Glu Asp Val 

180 185 190 



Lys Ala Tyr He Lys Lys Val Glu Pro Arg Glu Ala Gin Glu Tyr Leu 

195 200 205 

Lys Asp Leu Thr Leu Pro Ser Lys Lys Lys Glu He Lys Arg Gly Pro 

210 215 220 



Gin Lys Lys Asp Lys Ala Thr Gin Thr Ala Gin He Ser Asp Ala Glu 

35 225 230 235 240 

Thr Arg Ala Thr Asp He Thr Asp Asn Glu Asp Gly Asn Glu Asp Glu 

245 250 255 

40 Ser Ser Asp Tyr Glu Ser Pro Ser Asp He Asp Val Ser Glu Asp Met 

260 265 270 



Asp Ser Gly Glu He Ser Ala Asp Glu Leu Glu Glu Glu Glu Asp Glu 
275 280 285 

Glu Glu Asp Glu Asp Glu Glu Glu Lys Glu Ala Arg His Thr Asn Ser 
290 295 300 



Pro Arg Lys Arg Gly Arg Lys He Lys Leu Gly Lys Asp Asp He Asp 

50 305 310 315 320 

Ala Ser Val Gin Pro Pro Pro Lys Lys Arg Gly Arg Lys Pro Lys Asp 

325 330 335 

55 Pro Ser Lys Pro Arg Gin Met Leu Leu He Ser Ser Cys Arg Ala Asn 

340 345 350 



Asn Thr Pro Val He Arg Lys Phe Thr Lys Lys Asn Val Ala Arg Ala 

355 360 365 

Lys Lys Lys Tyr Thr Pro Phe Ser Lys Arg Phe Lys Ser He Ala Ala 

370 375 380 



He Pro Asp Leu Thr Ser Leu Pro Glu Phe Tyr Gly Asn Ser Ser Glu 
65 385 390 395 400 

Leu Met Ala Ser Arg Phe Glu Asn Lys Leu Lys Thr Thr Gin Lys His 
405 410 415 
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15 



Gin He Val Glu Thr He Phe Ser Lys Val Lys Lys Gin Leu Asn Ser 
420 425 430 

Ser Tyr Val Lys Glu Glu He Leu Lys Ser Ala Asn Phe Gin Asp Tyr 
5 435 440 445 

Leu Pro Ala Arg Glu Asn Glu Phe Ala Ser He Tyr Leu Ser Ala Tyr 
450 455 460 

10 Ser Ala He Glu Ser Asp Ser Ala Thr Thr He Tyr Val Ala Gly Thr 

465 470 475 480 

Pro Gly Val Gly Lys Thr Leu Thr Val Arg Glu Val Val Lys Glu Leu 
485 490 495 

Leu Ser Ser Ser Ala Gin Arg Glu He Pro Asp Phe Leu Tyr Val Glu 
500 505 510 

He Asn Gly Leu Lys Met Val Lys Pro Thr Asp Cys Tyr Glu Thr Leu 
20 515 520 525 

Trp Asn Lys Val Ser Gly Glu Arg Leu Thr Trp Ala Ala Ser Met Glu 
530 535 540 

Ser Leu Glu Phe Tyr Phe Lys Arg Val Pro Lys Asn Lys Lys Lys Thr 
545 550 555 560 

He Val Val Leu Leu Asp Glu Leu Asp Ala Met Val Thr Lys Ser Gin 
565 570 575 

Asp He Met Tyr Asn Phe Phe Asn Trp Thr Thr Tyr Glu Asn Ala Lys 
580 585 590 

Leu He Val He Ala Val Ala Asn Thr Met Asp Leu Pro Glu Arg Gin 
35 595 600 605 

Leu Gly Asn Lys He Thr Ser Arg He Gly Phe Thr Arg He Met Phe 
610 615 620 

40 Thr Gly Tyr Thr His Glu Glu Leu Lys Asn He He Asp Leu Arg Leu 

625 630 635 640 

Lys Gly Leu Asn Asp Ser Phe Phe Tyr Val Asp Thr Lys Thr Gly Asn 
645 650 655 

Ala He Leu He Asp Ala Ala Gly Asn Asp Thr Thr Val Lys Gin Thr 
660 665 670 

Leu Pro Glu Asp Val Arg Lys Val Arg Leu Arg Met Ser Ala Asp Ala 
50 675 680 685 

He Glu He Ala Ser Arg Lys Val Ala Ser Val Ser Gly Asp Ala Ara 
690 695 700 



25 



30 



45 



55 



60 



Arg Ala Leu Lys Val Cys Lys Arg Ala Ala Glu He Ala Glu Lys His 
705 710 715 720 

Tyr Met Ala Lys His Gly Tyr Gly Tyr Asp Gly Lys Thr Val He Glu 
725 730 735 

Asp Glu Asn Glu Glu Gin He Tyr Asp Asp Glu Asp Lys Asp Leu He 
740 745 750 

Glu Ser Asn Lys Ala Lys Asp Asp Asn Asp Asp Asp Asp Asp Asn Asp 
05 755 760 



765 



Gly Val Gin Thr Val His He Thr His Val Met Lys Ala Leu Asn Glu 
770 775 780 
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Thr Leu Asn Ser His Val lie Thr Phe Met Thr Arg Leu Ser Phe Thr 
785 790 795 800 

Ala Lys Leu Phe lie Tyr Ala Leu Leu Asn Leu Met Lys Lys Asn Gly 
5 805 810 815 

Ser Gin Glu Gin Glu Leu Gly Asp lie Val Asp Glu He Lys Leu Leu 
820 825 830 

10 He Glu Val Asn Gly Ser Asn Lys Phe Val Met Glu He Ala Lys Thr 

835 840 845 



15 



35 



Leu Phe Gin Gin Gly Ser Asp Asn He Ser Glu Gin Leu Arg He He 
850 855 860 

Ser Trp Asp Phe Val Leu Asn Gin Leu Leu Asp Ala Gly He Leu Phe 

865 870 875 880 



Lys Gin Thr Met Lys Asn Asp Arg He Cys Cys Val Lys Leu Asn He 
20 885 890 895 

Ser Val Glu Glu Ala Lys Arg Ala Met Asn Glu Asp Glu Thr Leu Arg 
900 905 910 

25 Asn Leu 

(2) INFORMATION FOR SEQ ID NO: 3: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2809 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 
40 (B) LOCATION: 807.. 2666 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



45 


GAGCTCAACA 


CCACCATTGA 


GAACGTAGAA 


TTTCAATTTT 


TAAGCTGATT 


CTCTTTCTGC 


60 


ATGAACTCTC 


CTAGCAATGT 


GAAACTTCTC 


TTAAGGGAAA 


TTTTCGCCTT 


TTTGAATGGG 


120 




CATACTTGGC 


CAAAAATTCA 


GGATTGAATA 


TATATAATCG 


GAACTTGTAT 


GGATAAAAAT 


180 


50 


TTATATCAAG 


AGTCTGTTTC 


TTAATTGGAT 


TTGCTGTGAT 


CTAGTATTGA 


GATGACTATA 


240 




AACCGGCCAG 


GAAATTAGTC 


TTTTCGAAGC 


TGGTTTTGGT 


TTCGCAAGAG 


TCTTTTTGAC 


300 


55 


AGCTTTTTGG 


CCTCAATTTG 


TATTCCCTTA 


ATACGCTTCT 


TCAACTCTGT 


CTTAGAGACC 


360 


ATTTCTCCAG 


TGGCCTCATC 


TAGGTGTAAA 


CTAGCAATAG 


CGTCACTAGC 


TGCCGTGACA 


420 




TTAACTTGCT 


GTGGCACCTT 


TATATGTAAT 


ATGAACCATC 


TTTCAATGGA 


TCATAAGAAT 


480 


60 


AAGTGTCGTA 


AAAGGCCAAA 


TATCCATGCA 


TAAATATCGA 


CTTATTCGCG 


TAAATGTGAT 


540 




ATGGATCAGC 


TAGTACCAAT 


TTCTAGTCTA 


GCAAAATCGG 


GAAAATTTTT 


CAGAACACCC 


600 


65 


ACTCACCGCA 


TCATTGAGGT 


GGAAATGACA 


ATAGTAAGCA 


GAATTGTTAT 


TCTTCACAAT 


660 


GTGTAAAAGT 


TATAAAGAAA 


TAGGAACCAC 


CTTTAAATTA 


AGACAAAGTA 


GAATATATTA 


720 




GCTGAAATTG 


TATTTGATAA 


TTGATCATTG 


ATCTTATTTG 


CTATATCTTT 


AAAACAAGTT 


730 
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TTTGTAGTAC TGCGAATTGC CATAAC ATG CTA AAT GGG GAA GAC TTT GTA GAG 833 

Met Leu Asn Gly Glu Asp Phe Val Glu 
1 5 

5 CAT AAT GAT ATC CTA TCG TCT CCG GCA AAA AGC AGG AAT GTA ACC CCA 881 
His Asn Asp lie Leu Ser Ser Pro Ala Lys Ser Arg Asn Val Thr Pro 
10 15 20 25 

AAA AGG GTT GAC CCA CAT GGA GAA AGA CAA CTG AGA AGA ATT CAT TCA 929 
10 Lys Arg Val Asp Pro His Gly Glu Arg Gin Leu Arg Arg lie His Ser 

30 35 40 

TCA AAG AAG AAT TTG TTG GAA AGA ATC TCG CTT GTA GGC AAC GAA AGG 977 
Ser Lys Lys Asn Leu Leu Glu Arg lie Ser Leu Val Gly Asn Glu Arg 
15 45 50 55 

AAA AAT ACA TCT CCA GAT CCG GCA CTC AAA CCT AAA ACG CCA AGT AAA 1025 
Lys Asn Thr Ser Pro Asp Pro Ala Leu Lys Pro Lys Thr Pro Ser Lys 
60 65 70 



20 



40 



60 



GCT CCC CGT AAA CGT GGA AGA CCA AGA AAG ATA CAG GAA GAA TTA ACT 1073 
Ala Pro Arg Lys Arg Gly Arg Pro Arg Lys lie Gin Glu Glu Leu Thr 
75 80 85 



25 GAT AGG ATC AAG AAG GAT GAG AAA GAT ACA ATT TCC TCT AAG AAA AAG 1121 
Asp Arg lie Lys Lys Asp Glu Lys Asp Thr lie Ser Ser Lys Lys Lys 
90 95 100 105 

AGG AAA TTG GAC AAA GAT ACA TCA GGT AAT GTC AAT GAG GAA AGC AAG 1169 
30 Arg Lys Leu Asp Lys Asp Thr Ser Gly Asn Val Asn Glu Glu Ser Lys 

110 115 120 

ACT TCT AAC AAC AAG CAG GTG ATG GAA AAG ACG GGG ATA AAA GAG AAA 1217 
Thr Ser Asn Asn Lys Gin Val Met Glu Lys Thr Gly lie Lys Glu Lys 
35 125 130 135 

AGA GAA CGC GAA AAA ATA CAG GTA GCG ACC ACA ACA TAT GAA GAT AAT 1265 
. Arg Glu Arg Glu Lys lie Gin Val Ala Thr Thr Thr Tyr Glu Asp Asn 
140 145 150 



GTG ACT CCA CAA ACT GAT GAT AAT TTT GTA TCA AAT TCA CCC GAG CCA 1313 
Val Thr Pro Gin Thr Asp Asp Asn Phe Val Ser Asn Ser Pro Glu Pro 
155 160 165 



45 CCA GAA CCT GCA ACA CCA TCT AAG AAG TCT TTA ACC ACT AAT CAT GAT 1361 
Pro Glu Pro Ala Thr Pro Ser Lys Lys Ser Leu Thr Thr Asn His Asp 
170 175 180 185 

TTT ACT TCG CCC CTA AAG CAA ATT ATA ATG AAT AAT TTA AAA GAA TAT 1409 
50 Phe Thr Ser Pro Leu Lys Gin lie lie Met Asn Asn Leu Lys Glu Tyr 

190 195 200 

AAA GAC TCA ACC TCC CCA GGT AAA TTA ACC TTG AGT AGA AAT TTT ACT 1457 
Lys Asp Ser Thr Ser Pro Gly Lys Leu Thr Leu Ser Arg Asn Phe Thr 
55 205 210 215 

CCA ACC CCT GTA CCG AAA AAT AAA AAG CTC TAC CAA ACT TCG GAA ACC 1505 
Pro Thr Pro Val Pro Lys Asn Lys Lys Leu Tyr Gin Thr Ser Glu Thr 
220 225 230 



AAG TCA GCA AGC TCG TTT TTG GAT ACT TTT GAA GGA TAT TTC GAC CAA 1553 
Lys Ser Ala Ser Ser Phe Leu Asp Thr Phe Glu Gly Tyr Phe Asp Gin 
235 240 245 



65 AGA AAA ATT GTC AGA ACT AAT GCG AAG TCA AGG CAC ACC ATG TCA ATG 1601 
Arg Lys lie Val Arg Thr Asn Ala Lys Ser Arg His Thr Met Ser Met 
250 255 260 265 
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GCA CCT GAC GTT ACC AGA GAA GAG TTT TCC CTA GTA TCA AAC TTT TTC 1649 
Ala Pro Asp Val Thr Arg Glu Glu Phe Ser Leu Val Ser Asn Phe Phe 
270 275 280 

5 AAC GAA AAT TTT CAA AAA CGT CCC AGG CAA AAG TTA TTT GAA ATT GAG 1697 
Asn Glu Asn Phe Gin Lys Arg Pro Arg Gin Lye Leu Phe Glu lie Gin 
285 290 295 

AAA AAA ATG TTT CCC CAG.TAT TGG TTT GAA TTG ACT CAA GGA TTC TCC 1745 
10 Lys Lys Met Phe Pro Gin Tyr Trp Phe Glu Leu Thr Gin Gly Phe Ser 
300 305 310 

TTA TTA TTT TAT GGT GTA GGT TCG AAA CGT AAT TTT TTG GAA GAG TTT 1793 
Leu Leu Phe Tyr Gly Val Gly Ser Lys Arg Asn Phe Leu Glu Glu Phe 
15 315 320 325 

GCC ATT GAC TAC TTG TCT CCG AAA ATC GCG TAC TCG CAA CTG GCT TAT 1841 
Ala lie Asp Tyr Leu Ser Pro Lys lie Ala Tyr Ser Gin Leu Ala Tyr 
330 335 340 345 



20 



40 



60 



GAG AAT GAA TTA CAA CAA AAC AAA CCT GTA AAT TCC ATC CCA TGC CTT 1889 
Glu Asn Glu Leu Gin Gin Asn Lys Pro Val Asn Ser lie Pro Cys Leu 
350 355 360 



25 ATT TTA AAT GGT TAC AAC CCT AGC TGT AAC TAT CGT GAC GTC TTC AAA 1937 
He Leu Asn Gly Tyr Asn Pro Ser Cys Asn Tyr Arg Asp Val Phe Lys 
365 370 375 

GAG ATT ACC GAT CTT TTG GTC CCC GCT GAG TTG ACA AGA AGC GAA ACT 1985 
30 Glu He Thr Asp Leu Leu Val Pro Ala Glu Leu Thr Arg Ser Glu Thr 
380 385 390 

AAG TAC TGG GGC AAT CAT GTG ATT TTG CAG ATC CAA AAG ATG ATT GAT 2033 
Lys Tyr Trp Gly Asn His Val He Leu Gin He Gin Lys Met He Asp 
35 395 400 405 

TTC TAC AAA AAT CAA CCT TTA GAT ATC AAA TTA ATA CTT GTA GTG CAT 2081 
Phe Tyr Lys Asn Gin Pro Leu Asp He Lys Leu He Leu Val Val His 
410 415 420 425 



AAT CTG GAT GGT CCT AGC ATA AGG AAA AAC ACT TTT CAG ACG ATG CTA 2129 
Asn Leu Asp Gly Pro Ser He Arg Lys Asn Thr Phe Gin Thr Met Leu 
430 435 440 



45 AGC TTC CTC TCC GTC ATC AGA CAA ATC GCC ATA GTC GCC TCT ACA GAC 2177 
Ser Phe Leu Ser Val He Arg Gin He Ala He Val Ala Ser Thr Asp 
445 450 455 

CAC ATT TAC GCT CCG CTC CTC TGG GAC AAC ATG AAG GCC CAA AAC TAC 2225 
50 His He Tyr Ala Pro Leu Leu Trp Asp Asn Met Lys Ala Gin Asn Tyr 
460 465 470 

AAC TTT GTC TTT CAT GAT ATT TCG AAT TTT GAA CCG TCG ACA GTC GAG 2273 
Asn Phe Val Phe His Asp He Ser Asn Phe Glu Pro Ser Thr Val Glu 
55 475 480 485 

TCT ACG TTC CAA GAT GTG ATG AAG ATG GGT AAA AGC GAT ACC AGC AGT 2321 
Ser Thr Phe Gin Asp Val Met Lys Met Gly Lys Ser Asp Thr Ser Ser 
490 495 500 505 



GGT GCT GAA GGT GCG AAA TAC GTC TTA CAA TCA CTT ACT GTG AAC TCC 2369 
Gly Ala Glu Gly Ala Lys Tyr Val Leu Gin Ser Leu Thr Val Asn Ser 
510 515 520 



65 AAG AAG ATG TAT AAG TTG CTT ATT GAA ACA CAA ATG CAG AAT ATG GGG 2417 
Lys Lys Met Tyr Lys Leu Leu He Glu Thr Gin Met Gin Asn Met Gly 
525 530 535 
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AAT CTA TCC GCT AAC ACA GGT CCT AAG CGT GGT ACT CAA AGA ACT GGA 2465 
Asn Leu Ser Ala Asn Thr Gly Pro Lys Arg Gly Thr Gin Arg Thr Gly 
540 545 550 

5 GTA GAA CTT AAA CTT TTC AAC CAT CTC TGT GCC GCT GAT TTT ATT GCT 2513 
Val Glu Leu Lys Leu Phe Asn His Leu Cys Ala Ala Asp Phe lie Ala 
555 560 565 

TCT AAT GAG ATA GCT CTA AGG TCG ATG CTT AGA GAA TTC ATA GAA CAT 2561 
10 Ser Asn Glu He Ala Leu Arg Ser Met Leu Arg Glu Phe He Glu His 
570 575 580 585 

AAA ATG GCC AAC ATA ACT AAG AAC AAT TCT GGA ATG GAA ATT ATT TGG 2609 
Lys Met Ala Asn He Thr Lys Asn Asn Ser Gly Met Glu He He Trp 
15 590 595 600 

GTA CCC TAC ACG TAT GCG GAA CTT GAA AAA CTT CTG AAA ACC GTT TTA 2657 
Val Pro Tyr Thr Tyr Ala Glu Leu Glu Lys Leu Leu Lys Thr Val Leu 
605 610 615 

20 

AAT ACT CTA TAAATGTATA CATATCACGA ACAATTGTAA TAGTACTAGG 2706 
Asn Thr Leu 
620 

25 CTTGCTAGCT TTGCTTTCCC ATAACCAACA ATACTTAGTG ATGTATCTTA AAACGACTAA 2766 
AAAACTTCTC ATATAACCCT ACTGAAAAAC GTCTGATGAG CTC 2809 

30 (2) INFORMATION FOR SEQ ID NO: 4: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 620 amino acids 

(B) TYPE: amino acid 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



40 



Met Leu Asn Gly Glu Asp Phe Val Glu His Asn Asp He Leu Ser Ser 
1 5 10 • 15 



Pro Ala Lys Ser Arg Asn Val Thr Pro Lys Arg Val Asp Pro His Gly 

45 20 25 30 

Glu Arg Gin Leu Arg Arg He His Ser Ser Lys Lys Asn Leu Leu Glu 

35* 40 45 

50 Arg He Ser Leu Val Gly Asn Glu Arg Lys Asn Thr Ser Pro Asp Pro 

50 55 60 



55 



Ala Leu Lys Pro Lys Thr Pro Ser Lys Ala Pro Arg Lys Arg Gly Arg 

65 70 75 80 

Pro Arg Lys He Gin Glu Glu Leu Thr Asp Arg He Lys Lys Asp Glu 

85 90 95 



Lys Asp Thr He Ser Ser Lys Lys Lys Arg Lys Leu Asp Lys Asp Thr 
60 100 105 110 

Ser Gly Asn Val Asn Glu Glu Ser Lys Thr Ser Asn Asn Lys Gin Val 

115 120 125 

65 Met Glu Lys Thr Gly He Lys Glu Lys Arg Glu Arg Glu Lys He Gin 

130 135 140 



50 



wo 95/16694 



PCT/US94/14563 



Val Ala Thr Thr Thr Tyr Glu Asp Asn Val Thr Pro Gin Thr Asp Asp 
145 150 155 160 

Asn Phe Val Ser Asn Ser Pro Glu Pro Pro Glu Pro Ala Thr Pro Ser 
5 165 170 175 

Lys Lye Ser Leu Thr Thr Asn His Asp Phe Thr Ser Pro Leu Lys Gin 
180 185 " 190 

10 lie lie Met Asn Asn Leu Lys Glu Tyr Lys Asp Ser Thr Ser Pro Gly 
195 200 205 

Lys Leu Thr Leu Ser Arg Asn Phe Thr Pro Thr Pro Val Pro Lys Asn 
210 215 220 

15 

Lys Lys Leu Tyr Gin Thr Ser Glu Thr Lys Ser Ala Ser Ser Phe Leu 
225 230 235 240 

Asp Thr Phe Glu Gly Tyr Phe Asp Gin Arg Lys lie Val Arg Thr Asn 
20 245 250 255 

Ala Lys Ser Arg His Thr Met Ser Met Ala Pro Asp Val Thr Arg Glu 
260 265 270 

25 Glu Phe Ser Leu Val Ser Asn Phe Phe Asn Glu Asn Phe Gin Lys Arg 
275 280 285 

Pro Arg Gin Lys Leu Phe Glu lie Gin Lys Lys Met Phe Pro Gin Tyr 
290 295 300 

30 

Trp Phe Glu Leu Thr Gin Gly Phe Ser Leu Leu Phe Tyr Gly Val Gly 
305 310 315 320 

Ser Lys Arg Asn Phe Leu Glu Glu Phe Ala lie Asp Tyr Leu Ser Pro 
35 325 330 335 

Lys lie Ala Tyr Ser Gin Leu Ala Tyr Glu Asn Glu Leu Gin Gin Asn 
340 345 350 

40 Lys Pro Val Asn Ser lie Pro Cys Leu lie Leu Asn Gly Tyr Asn Pro 
355 360 365 

Ser Cys Asn Tyr Arg Asp Val Phe Lys Glu He Thr Asp Leu Leu Val 
370 375 380 

45 

Pro Ala Glu Leu Thr Arg Ser Glu Thr Lys Tyr Trp Gly Asn His Val 
385 390 395 400 

He Leu Gin He Gin Lys Met He Asp Phe Tyr Lys Asn Gin Pro Leu 
50 405 410 415 

Asp He Lys Leu He Leu Val Val His Asn Leu Asp Gly Pro Ser He 
420 425 430 

55 Arg Lys Asn Thr Phe Gin Thr Met Leu Ser Phe Leu Ser Val He Arg 
435 440 445 

Gin He Ala He Val Ala Ser Thr Asp His He Tyr Ala Pro Leu Leu 
450 455 460 

60 

Trp Asp Asn Met Lys Ala Gin Asn Tyr Asn Phe Val Phe His Asp He 
465 470 475 480 

Ser Asn Phe Glu Pro Ser Thr Val Glu Ser Thr Phe Gin Asp Val Met 
65 485 490 495 

Lys Met Gly Lys Ser Asp Thr Ser Ser Gly Ala Glu Gly Ala Lys Tyr 
500 505 510 
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Val Leu Gin Ser Leu Thr Val Asn Ser Lys Lys Met Tyr Lys Leu Leu 
515 520 525 

He Glu Thr Gin Met Gin Asn Met Gly Asn Leu Ser Ala Asn Thr Gly 
5 530 535 540 

Pro Lys Arg Gly Thr Gin Arg Thr Gly Val Glu Leu Lys Leu Phe Asn 
545 550 555 ' 560 

10 His Leu Cys Ala Ala Asp Phe He Ala Ser Asn Glu He Ala Leu Arg 

565 570 575 



15 



40 



50 



Ser Met Leu Arg Glu Phe He Glu His Lys Met Ala Asn He Thr Lys 
580 585 590 

Asn Asn Ser Gly Met Glu He He Trp Val Pro Tyr Thr Tyr Ala Glu 

595 600 605 



Leu Glu Lys Leu Leu Lys Thr Val Leu Asn Thr Leu 
20 610 615 620 

(2) INFORMATION FOR SEQ ID NO: 5: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2759 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
35 TCTGAAATAA AAAGTACAAA AAAGAAAACA ATATACCAGA TATGAACCCT TTTAGTGAGA 60 
TTCCAGCATG TCTTTGCGCA GATCCAAATC TTTCTTTGTC TTGAAATTTA TTCAGTAAAT 120 
TAAAAGTCAG TTCTTTAGTA GCATTCATCT TCTTGGTAAG TCTTTTTCTT GTTTTTGAAA 180 
AAGAGTTCCT GAAGTTTGTC TACTGTGAAT ATACTTTGCA CATTTGTTTA ATTTTTAAAC 240 
ACGCTATAAT TTGTGTCATA AAGAATTTTT TGTAGAATAG CTTTTTTTTT AATAGGAAAA 300 
45 AAAAATAAAA AAAGGTGGAA AAGACAATCT TTTCCAGAAA CTTGAAACTA TACTGGAGAT 360 
GAAGGGTTGT CGTTGGTTGC GTTACGAGAC AGGCTTGACA ATTTCACAAG AGTAATGTTT 420 
CATTACCTGC TGTTTTATTA TCTTTATATT TAGTAAGACC AGCAGAAACG CTACACGTGA 480 
TGATAATGGA ACTAAGCATT CTGTTAGATG GTAAGAATTT TTTTTACCTT CCATTACCAC 540 
TAACGCCTTT TTTAGTGTCT TTTTGATATT TACTGACGTA TTTTTCCGCA CCGTAATTTG 600 
55 AAGAAAAAGA AAAGTGACAA AAGATGGCAT TGTTTACATA CAGAGTCGTA GTATCACAAG 660 
AGTAGTCCAA CAGGATGAGC GACCTTAACC AATCCAAAAA GATGAACGTC AGCGAGTTTG 720 
CTGACGCCCA AAGGAGCCAC TATACAGTAT ACCCCAGTTT GCCTCAAAGT AACAAAAATG 780 

60 

ATAAACACAT TCCCTTTGTC AAACTTCTAT CAGGCAAAGA ATCGGAAGTG AACGTGGAAA 840 
AAAGATGGGA ATTGTATCAT CAGTTACATT CCCACTTTCA TGATCAAGTA GATCATATTA 900 
65 TCGATAATAT TGAAGCAGAC TTGAAAGCAG AGATTTCAGA CCTTTTATAT AGTGAAACTA 960 
CTCAGAAAAG GCGATGCTTT AACACTATTT TCCTATTAGG TTCAGATAGT ACGACAAAAA 1020 
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TTGAACTTAA 


AGACGAATCT 


TCTCGCTACA 


ACGTTTTGAT 


TGAATTGACT 


CCGAAAGAAT 1080 




CTCCGAATGT 


AAGAATGATG 


CTTCGTAGGT 


CTATGTACAA 


ACTTTACAGC 


GCAGCTGATG 1140 


5 


CAGAAGAACA 


TCCAACTATC 


AAGTATGAAG 


ACATTAACGA 


TGAAGATGGC 


GATTTTACCG 1200 




AGCAAAACAA 


TGATGTATCA 


TACGATCTGT 


CACTTGTGGA 


AAACTTCAAA 


AGGCTTTTTG 1260 


10 


GAAAAGACTT 


AGCAATGGTA 


TTTAATTTTA 


AAGATGTAGA 


TTCTATTAAC 


TTCAACACAT 1320 


TGGATAACTT 


CATAATTCTA 


TTGAAAAGTG 


CCTTCAAGTA 


TGACCATGTT 


AAAATAAGTT 1380 




TAATCTTTAA 


TATTAATACA 


AACTTGTCAA 


ATATTGAGAA 


AAATTTGAGA 


CAATCAACCA 1440 


15 


TACGACTTCT 


GAAGAGAAAT 


TATCATAAAC 


TAGACGTGTC 


GAGTAATAAA 


GGATTTAAGT 1500 




ACGGAAACCA 


AATCTTTCAA 


AGCTTTTTGG 


ATACGGTTGA 


TGGCAAACTA 


AATCTTTCAG 1560 


20 


ATCGTTTTGT 


GGAATTCATT 


CTCAGCAAGA 


TGGCAAATAA 


TACTAATCAC 


AACTTACAAT 1620 


TATTGACGAA 


GATGCTGGAT 


TATTCGTTGA 


TGTCGTACTT 


TTTCCAGAAT 


GCCTTTTCAG 1680 




TATTCATTGA 


CCCTGTAAAT 


GTTGATTTTT 


TGAACGACGA 


CTACTTAAAA 


ATACTGAGCA 1740 


25 


GATGTCCTAC 


ATTCATGTTC 


TTTGTCGAAG 


GTCTTATAAA 


GCAGCATGCT 


CCTGCTGACG 1800 




AAATTCTTTC 


ATTATTGACA 


AACAAAAACA 


GAGGCCTAGA 


AGAGTTTTTT 


GTTGAGTTTT 1860 


30 


TGGTAAGAGA 


GAACCCGATT 


AACGGGCATG 


CTAAGTTTGT 


TGCTCGATTC 


CTCGAAGAAG 1920 


AATTGAATAT 


AACCAATTTT 


AATCTGATAG 


AATTATATCA 


TAATTTGCTT 


ATTGGCAAAC 1980 




TAGACTCCTA 


TCTAGATCGT 


TGGTCAGCAT 


GTAAAGAGTA 


TAAGGATCGG 


CTTCATTTTG 2040 


35 


AACCCATTGA 


TACAATTTTT 


CAAGAGCTAT 


TTACTTTGGA 


CAACAGAAGT 


GGATTACTTA 2100 




CCCAGTCGAT 


TTTCCCTTCT 


TACAAGTCAA 


ATATCGAAGA 


TAACTTACTA 


AGTTGGGAGC 2160 


40 


AGGTGCTGCC 


TTCGCTTGAT 


AAAGAAAATT 


ATGATACTCT 


TTCTGGAGAT 


TTGGATAAAA 2220 


TAATGGCTCC 


GGTACTGGGT 


CAGCTATTCA 


AGCTTTATCG 


TGAGGCGAAT 


ATGACTATCA 2280 




ACATTTACGA 


TTTCTACATT 


GCGTTCAGAG 


AAACATTACC 


AAAAGAGGAA 


ATATTAAATT 2340 


45 


TCATAAGAAA 


AGATCCCTCC 


AACACCAAAC 


TCTTAGAACT 


AGCAGAAACA 


CCGGACGCAT 2400 




TTGACAAAGT 


AGCACTAATT 


TTATTCATGC 


AAGCAATCTT 


CGCCTTTGAA 


AACATGGGTC 2460 


50 


TCATTAAGTT 


TCAAAGCACC 


AAGAGTTACG 


ATCTGGTAGA 


AAAATGTGTC 


TGGAGAGGAA 2520 


XT TAG AT AAA 


GAATGCACGG 


A 1 AAATAAGT 


AAATAAATAA 


CCATACATAT 


ATAGAACCAT 2580 




AGAACCACGT 


TTTTGTAATG 


AACAGTCTAC 


CTGTATCTCA 


TCATTTTTCT 


GTGTTAACTA 2640 


55 


TTATTATTAT 


TATTATCGAA 


TGGAGGGTAA 


TATTATGTAT 


AGGTAAAATA 


AATAGATAGT 2700 




GCCATGATGC 


GCGAAGATTG 


GCAATGGGAA 


ACTCAAGAAG 


GCAGCAACAA 


AAAAATAAA 2759 



60 (2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 615 amino acids 

(B) TYPE: amino acid 

65. (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

Met Ser Asp Leu Asn Gin Ser Lys Lys Met Asn Val Ser Glu Phe Ala 
15 10 15 

5 

Asp Ala Gin Arg Ser His Tyr Thr Val Tyr Pro Ser Leu Pro Gin Ser 
20 25 30 

Asn Lys Asn Asp Lys His lie Pro Phe Val Lys Leu Leu Ser Gly Lys 
10 35 40 45 

Glu Ser Glu Val Asn Val Glu Lys Arg Trp Glu Leu Tyr His Gin Leu 
50 55 60 

15 His Ser His Phe His Asp Gin Val Asp His lie lie Asp Asn lie Glu 

65 70 75 80 



20 



35 



50 



65 



Ala Asp Leu Lys Ala Glu lie Ser Asp Leu Leu Tyr Ser Glu Thr Thr 
85 90 95 

Gin Lys Arg Arg Cys Phe Asn Thr lie Phe Leu Leu Gly Ser Asp Ser 
100 105 110 



Thr Thr Lys lie Glu Leu Lys Asp Glu Ser Ser Arg Tyr Asn Val Leu 
25 115 120 125 

lie Glu Leu Thr Pro Lys Glu Ser Pro Asn Val Arg Met Met Leu Arg 
130 135 140 

30 Arg Ser Met Tyr Lys Leu Tyr Ser Ala Ala Asp Ala Glu Glu His Pro 

145 150 155 160 



Thr lie Lys Tyr Glu Asp lie Asn Asp Glu Asp Gly Asp Phe Thr Glu 
165 170 175 

Gin Asn Asn Asp Val Ser Tyr Asp Leu Ser Leu Val Glu Asn Phe Lys 
180 185 190 



Arg Leu Phe Gly Lys Asp Leu Ala Met Val Phe Asn Phe Lys Asp Val 
40 195 200 205 

Asp Ser lie Asn Phe Asn Thr Leu Asp Asn Phe He He Leu Leu Lys 
210 215 220 

45 Ser Ala Phe Lys Tyr Asp His Val Lys He Ser Leu He Phe Asn He 

225 ' 230 235 240 



Asn Thr Asn Leu Ser Asn He Glu Lys Asn Leu Arg Gin Ser Thr He 

245 250 255 

Arg Leu Leu Lys Arg Asn Tyr His Lys Leu Asp Val Ser Ser Asn Lys 

260 265 270 



Gly Phe Lys Tyr Gly Asn Gin He Phe Gin Ser Phe Leu Asp Thr Val 
55 275 280 285 

Asp Gly Lys Leu Asn Leu Ser Asp Arg Phe Val Glu Phe He Leu Ser 

290 295 300 

60 Lys Met Ala Asn Asn Thr Asn His Asn Leu Gin Leu Leu Thr Lys Met 

305 310 315 320 



Leu Asp Tyr Ser Leu Met Ser Tyr Phe Phe Gin Asn Ala Phe Ser Val 
325 330 335 

Phe He Asp Pro Val Asn Val Asp Phe Leu Asn Asp Asp Tyr Leu Lys 
340 345 350 
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He Leu Ser Arg Cys Pro Thr Phe Met Phe Phe Val Glu Gly Leu He 
355 360 365 

Lys Gin His Ala Pro Ala Asp Glu He Leu Ser Leu Leu Thr Asn Lys 
5 370 "ndi75 380 

Asn Arg Gly Leu Glu Glu Phe Phe Val Glu Phe Leu Val Arg Glu Asn 
385 390 395 400 

10 Pro He Asn Gly His Ala Lys Phe Val Ala Arg Phe Leu Glu Glu Glu 

405 410 415 



15 



30 



45 



Leu Asn He Thr Asn Phe Asn Leu He Glu Leu Tyr His Asn Leu Leu 

420 425 430 

He Gly Lys Leu Asp Ser Tyr Leu Asp Arg Trp Ser Ala Cys Lys Glu 

435 440 445 



Tyr Lys Asp Arg Leu His Phe Glu Pro He Asp Thr He Phe Gin Glu 

20 450 455 460 

Leu Phe Thr Leu Asp Asn Arg Ser Gly Leu Leu Thr Gin Ser He Phe 

465 470 475 480 

25 Pro Ser Tyr Lys Ser Asn He Glu Asp Asn Leu Leu Ser Trp Glu Gin 

485 490 495 



Val Leu Pro Ser Leu Asp Lys Glu Asn Tyr Asp Thr Leu Ser Gly Asp 

500 505 510 

Leu Asp Lys He Met Ala Pro Val Leu Gly Gin Leu Phe Lys Leu Tyr 

515 520 525 



Arg Glu Ala Asn Met Thr He Asn He Tyr Asp Phe Tyr He Ala Phe 

35 530 535 540 

Arg Glu Thr Leu Pro Lys Glu Glu He Leu Asn Phe He Arg Lys Asp 

545 550 555 560 

40 * Pro Ser Ash Thr Lys Leu Leu Glu Leu Ala Glu Thr Pro Asp Ala Phe 

565 570 575 



Asp Lys Val Ala Leu He Leu Phe Met Gin Ala He Phe Ala Phe Glu 
580 585 590 

Asn Met Gly Leu He Lys Phe Gin Ser Thr Lys Ser Tyr Asp Leu Val 
595 600 605 



Glu Lys Cys Val Trp Arg Gly 
50 610 615 

(2) INFORMATION FOR SEQ ID NO: 7: 

55 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2404 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

60 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
65 CTCGAGGCCA CCAAGAAGAG AAAGAGAAGA GCCAGATATT GACTGGAGTG CAGCCAGAGG 60 
TTCCAACTTC CAAAGCTCCT CGGAGCCACC AAGAAGAGAA AGAGAAAAGG AAGAACCAGC 120 
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TTTGGATTGG GGTGCTGCCA GAGGTGCTCA 

CTACAAGGAT AGGTCTCTAA CTAACAAAAA 

5 GTCTGTTTAT GATGTTTTAC GTACTGAAGA 

AAATGGAGAC GCAAAAGAAA ACAAAGTTGA 

TGCTCAATTG ACTGTTGAAG ATGGTGACAA 

10 

TGTATGATGA TAAAATGTAC ATTTGTATTT 

TACTCTCCTT TCTACCAGGT ATTCTAACTC 

15 TATTTTGTAT TAAGTTTCAT ACATGTGTTC 

TGTGAGGTAA GTTTTTGAAT GTCCCATTTT 

GCATTAACAA TTAAAAAAAA AAAAAAAATC 

20 

AACTACTAAT ATCGGTAATA TTCAAAAGAA 

CACCGCAAGT CAATCTTCTC CCAATAAAGA 

25 CAGCGATTCT AAAAAAGCGT ACTATAGATA 

TTGGTTCCCT TCAAAGAAGG TTACTGCAGC 

AGATAATCTT CACATATTTA CAAGATTGTC 

30 

CCATTATTCA GAAAGAGAGT CATTCAGTAA 

CATACTTATT AGACTATGAA CTGTCTTTGT 

35 CTATCAGGTT GAATGGGTTT ATTCACTCCG 

AATTGGAACA GCAGTTGCAG AAAATTCATG 

TAGAGACTAT TAGCAGTGGT TCTTTGACAG 

40 

ATTCGACCAC GAAGACAAGA AATGAAGATA 

AGATAACAGT TGTTTTTATA TTCGATGAAA 

45 CTTTATTATA CAATCTTTTT GACATGGTAG 

GCTGCACAAC GAAATTAAAT ATCTTGGAAT 

CTCAAAGAGT GATTTATATG CCGCAAATAC 

50 

GAAATTTACT TACAGTTCGC TCTGAAATCT 

TGGAAAAAGA ACTATCCGAC CCTCGATCGA 

55 AAACCTTTAG GTCATTACCT ACATTGAAAA 

AAAATTTTGG TTCACTCTGC ACTGCCATAA 

AGAACCAACT ATCTAATAAT TTAACAGGAA 

60 

CCATTTTGAT CTCAGCCGCT AGGGTTGCCT 

ATTTAGCTTA TGCAGAGTAT GAAAAGATGA 

65 TGGCTCCTAC TACAAATGTG GGAACAGGTC 

AACTATGGTT GAAAAAGGAC GTCAAGAACG 



PCT/US94/14563 

GTTTGGTAAG CCTCAACAAA CCAAAAATAC 180 
GACTACTGAT GAGCAACCAA AAATCCAGAA 240 
TGATGATGAA GATGAAGAGG CTGAAAAGCA 300 
TGCGGCAGTT GAAAAGCTAC AGGATAAAAC 360 
TTGGGAAGTT GTTGGTAAGA AATAGAGTGT 420 
ACTGTTTGCT TTTTTTCTTT CTTGTTTTTC 480 
TATTATATAA TTAAAAAAAA AATAACCATA 540 
AAGTGTATTT TTGGATTTAT CATTTTTCTA 600 
CCTTTCGTTT TTGGAAAGTT CTAAGAAAAA 660 
TAAATAATAC TGATAGAAAT ATCAAATATA 720 
GAAGCATGAC TATAAGCGAA GCTCGTCTAT 780 
GGCACTCAAA CGAAGAGGTA GAGGAGACTG 840 
ATGAAAAGTG TAAAGACAGC GACCCTGGTT 900 
AACTTTATGG CACACTTCCT ACGGACGAAA 960 
AACAAGAGAT CGATAGAATC ATTAAACAAT 1020 
TTCTCGTGGG GCCCAGACAA AGTTACAAAA 1080 
TGCAACAATC TTATAAAGAG CAGTTTATAA 1140 
AACAAACAGC TATTAACGGT ATAGCAACTC 1200 
GCAGTGAAGA AAAAATTGAC GATACTTCAT 1260 
AAGTGTTTGA GAAAATTCTT TTACTCTTAG 1320 
GTGGTGAGGT TGACAGAGAG AGTATAACAA 1380 
TTGATACATT TGCTGGGCCT GTGAGGCAAA 1440 
AACATTCTCG GGTACCTGTT TGCATTTTTG 1500 
ATTTAGAAAA GAGGGTAAAG AGTAGATTTT 1560 
AGAATCTAGA CGATATGGTT GACGCCGTCA 1620 
CCCCCTGGGT TTCACAATGG AATGAAACGT 1680 
ATTTGAATAG ACATATTAGG ATGAATTTCG 1740 
ATAGCATAAT TCCATTAGTA GCGACATCCA 1800 
AATCGTGTTC TTTTCTTGAC ATATACAATA 1860 
GGCTCCAATC TTTATCCGAT TTAGAGTTAG 1920 
TAAGGGCGAA AGACGGATCT TTTAATTTTA 1980 
TTAAAGCTAT CAACTCCAGA ATTCCCACCG 2040 
AAAGTACTTT TTCTATCGAC AATACTATCA 2100 
TTTGGGAAAA TTTAGTGCAA CTGGATTTTT 2160 
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TTACCGAGAA ATCAGCCGTT GGTTTGAGAG ATAATGCGAC CGCAGCATTT TACGCTAGCA 2220 
ATTATCAATT TCAGGGCACC ATGATCCCGT TTGACTTGAG AAGTTACCAG ATGCAGATCA 2280 
5 TTCTTCAGGA ATTAAGAAGA ATTATCCCCA AATCTAATAT GTACTACTCC TGGACACAAC 2340 
TGTGAATCTT GGGAACAATA TACAGACATT TTATTGGCGG TAGCAACTCT GATATTCCAC 2400 
TGTT 2404 

10 

<2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 529 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Thr lie Ser Glu Ala Arg Leu Ser Pro Gin Val Asn Leu Leu Pro 
25 1 5 10 15 

lie Lys Arg His Ser Asn Glu Glu Val Glu Glu Thr Ala Ala He Leu 
20 25 30 



30 



45 



60 



Lys Lys Arg Thr He Asp Asn Glu Lys Cys Lys Asp Ser Asp Pro Gly 
35 40 45 



Phe Gly Ser Leu Gin Arg Arg Leu Leu Gin Gin Leu Tyr Gly Thr Leu 

35 50 55 60 

Pro Thr Asp Glu Lys He He Phe Thr Tyr Leu Gin Asp Cys Gin Gin 
65 70 75 80 

40 Glu He Asp Arg He He Lys Gin Ser He He Gin Lys Glu Ser His 

85 . 90 95 



Ser Val He Leu Val Gly Pro Arg Gin Ser Tyr Lys Thr Tyr Leu Leu 

100 105 110 

Asp Tyr Glu Leu Ser Leu Leu Gin Gin Ser Tyr Lys Glu Gin Phe He 

115 120 125 



Thr He Arg Leu Asn Gly Phe He His Ser Glu Gin Thr Ala He Asn 

50 130 135 140 

Gly He Ala Thr Gin Leu Glu Gin Gin Leu Gin Lys He His Gly Ser 

145 150 155 160 

55 Glu Glu Lys He Asp Asp Thr Ser Leu Glu Thr He Ser Ser Gly Ser 

165 170 175 



Leu Thr Glu Val Phe Glu Lys He Leu Leu Leu Leu Asp Ser Thr Thr 

180 185 190 

Lys Thr Arg Asn Glu Asp Ser Gly Glu Val Asp Arg Glu Ser He Thr 

195 200 205 



Lys He Thr Val Val Phe He Phe Asp Glu He Asp Thr Phe Ala Gly 

65 210 215 220 

Pro Val Arg Gin Thr Leu Leu Tyr Asn Leu Phe Asp Met Val Glu His 

225 230 - 235 240 
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Ser Arg Val Pro Val Cys lie Phe Gly Cys Thr Thr Lys Leu Asn He 
245 250 255 

Leu Glu Tyr Leu Glu Lys Arg Val Lys Ser Arg Phe Ser Gin Arg Val 
5~ 260 265 270 

He Tyr Met Pro Gin He Gin Asn Leu Asp Asp Met Val Asp Ala Val 
275 280 285 

10 Arg Asn Leu Leu Thr Val Arg Ser Glu He Ser Pro Trp Val Ser Gin 

290 295 300 



15 



30 



45 



65 



Trp Asn Glu Thr Leu Glu Lys Glu Leu Ser Asp Pro Arg Ser Asn Leu 

305 310 315 320 

Asn Arg His He Arg Met Asn Phe Glu Thr Phe Arg Ser Leu Pro Thr 

325 330 335 



Leu Lys Asn Ser He He Pro Leu Val Ala Thr Ser Lys Asn Phe Gly 
20 340 345 350 

Ser Leu Cys Thr Ala He Lys Ser Cys Ser Phe Leu Asp He Tyr Asn 
355 360 365 

25 Lys Asn Gin Leu Ser Asn Asn Leu Thr Gly Arg Leu Gin Ser Leu Ser 

370 375 380 



Asp Leu Glu Leu Ala He Leu He Ser Ala Ala Arg Val Ala Leu Arg 

385 390 395 400 

Ala Lys Asp Gly Ser Phe Asn Phe Asn Leu Ala Tyr Ala Glu Tyr Glu 

405 410 415 



Lys Met He Lys Ala He Asn Ser Arg He Pro Thr Val Ala Pro Thr 
.35 420 . 425 430 

Thr Asn Val Gly Thr Gly Gin Ser Thr Phe Ser He Asp Asn Thr He 
435 440 445 

40 Lys Leu Trp Leu Lys Lys Asp Val Lys Asn Val Trp Glu Asn Leu Val 

450 455 460 



Gin Leu Asp Phe Phe Thr Glu Lys Ser Ala Val Gly Leu Arg Asp Asn 
465 470 475 480 

Ala Thr Ala Ala Phe Tyr Ala Ser Asn Tyr Gin Phe Gin Gly Thr Met 
485 490 495 



He Pro Phe Asp Leu Arg Ser Tyr Gin Met Gin He He Leu Gin Glu 
50 500 505 510 

Leu Arg Arg He He Pro Lys Ser Asn Met Tyr Tyr Ser Trp Thr Gin 
515 520 525 

55 Leu 



(2) INFORMATION FOR SEQ ID NO: 9: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2306 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
GCTATTTTTT CATGCGTCAG ATGTCACAAA GCCTTTAATC AAGTATTGTT GCAAGAACAC 60 
5 CTGATTCAAA AACTACGTTC TGATATCGAA TCCTATTTAA TTCAAGATTT GAGATGCTCC 120 
AGATGTCATA AAGTGAAACG TGACTATATG AGTGCCCACT GTCCATGTGC CGGCGCGTGG 180 
GAAGGAACTC TCCCCAGAGA AAGCATTGTT CAAAAGTTAA ATGTGTTTAA GCAAGTAGCC 240 

10 

AAGTATTACG GTTTTGATAT ATTATTGAGT TGTATTGCTG ATTTGACCAT ATGAGTAAGC 300 
AGTATATAAC GCGAGGTTCA ATGGCCTCTT TACCATGAAA AAAAAAAAAA AAAAAAAAAA 360 
15 AAGGTAAGGA AAAAGAGTAT TTTCAATTCG TTTCTGAACA TATAAATATA AATAACCGAA 420 
AAATTAGCCC TTGAACATAA TTAACACTCT TCTTTGATAT TTAAATCACA AGTACTTTTC 480 
TTTTATTTTC TTCTTAATAC TTTTGGAAAT AAAATGAATG TGACCACTCC GGAAGTTGCT 540 

20 

TTTAGGGAAT ATCAAACCAA CTGTCTCGCA TCGTATATTT CTGCTGATCC AGACATAACT 600 
CCTTCAAATT TAATCTTGCA AGGTTATAGT GGAACAGGAA AAACCTACAC TTTGAAGAAG 660 
25 TATTTTAATG CGAATCCAAA TTTGCATG'CA GTATGGCTGG AACCTGTTGA GTTGGTTTCT 720 
TGGAAGCCCT TACTGCAGGC GATAGCACGT ACTGTACAAT ATAAATTGAA AACCCTATAT 780 
CCAAACATTC CCACCACAGA TTACGATCCT TTACAGGTTG AAGAGCCATT TCTTTTGGTA 840 

30 

AAGACGTTGC ACAATATTTT TGTCCAATAT GAATCTTTGC AAGAAAAGAC TTGCTTGTTC 900 
TTGATATTGG ATGGTTTCGA TAGTTTACAA GATTTAGACG CCGCACTGTT TAACAAATAT 960 
35 ATCAAACTAA ATGAATTACT TCCAAAAGAT TCTAAAATTA ATATAAAATT CATTTACACG 1020 
ATGTTAGAGA CATCATTTTT GCAAAGATAT TCTACACATT GCATTCCAAC TGTTATGTTT 1080 
CCGAGGTATA ATGTGGACGA AGTTTCTACT ATATTAGTGA TGTCTAGATG TGGCGAACTC 1140 

40 

ATGGAAGATT CTTGTCTACG TAAGCGTATC ATTGAAGAGC AGATAACGGA CTGTACAGAC 1200 
GATCAATTTC AAAATGTAGC TGCGAACTTC ATTCACTTAA TTGTGCAGGC TTTTCATTCT 1260 
45 TATACTGGAA ACGACATATT CGCATTGAAT GACTTGATAG ACTTCAAATG GCCCAAGTAT 1320 
GTATCTCGCA TTACTAAGGA AAACATATTT GAACCACTGG CTCTTTACAA AAGTGCCATC 1380 
AAACTATTTT TAAGCACAGA TGATAATTTA AGTGAAAATG GACAAGGTGA AAGCGCGATA 1440 

50 

ACCACAAATC GTGATGACCT TGAGAACAGT CAAACTTACG ACTTATCAAT AATTTCGAAG 1500 
TATCTGCTCA TAGCCTCATA TATTTGTTCA TATCTGGAAC CTAGATACGA TGCGAGTATT 1560 
55 TTCTCTAGGA AAACACGTAT CATACAAGGT AGAGCTGCTT ATGGACGAAG AAAGAAGAAA 1620 
GAAGTTAACC CTAGATATTT ACAGCCTTCT TTATTTGCTA TTGAAAGACT TTTGGCTATT 1680 
TTCCAAGCTA TATTCCCTAT TCAAGGTAAG GCGGAGAGTG GTTCCCTATC TGCACTTCGT 1740 

60 

GAGGAATCCT TAATGAAAGC GAATATCGAG GTTTTTCAAA ATTTATCCGA ATTGCATACA 1800 
TTGAAATTAA TAGCTACAAC CATGAACAAG AATATCGACT ATTTGAGTCC TAAAGTCAGG 1860 
65 TGGAAAGTAA ACGTTCCCTG GGAAATTATT AAAGAAATAT CAGAATCTGT TCATTTCAAT 1920 
ATCAGCGATT ACTTCAGCGA TATTCACGAA TGATTATCTC CCTGGAAGGT ATCCAGAGGG 1980 
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CAGGATACGT TCGAAACAAC AACTACGTTA TATAAATATT TATACATAGT GGGATAGAAT 2040 

GAACAATTAT CAAGTAAACC TTGTATTTTT TGTTCCCACG CTCTACGCTC TGTTTCTTGG 2100 

5 ATATGGTAAT CAAAGATTAA TACGTATAAC CGTTATTAAT TCAGTCCACT AGAAACTATT 2160 

AAAAGCGCCC TACTGTATGG AAAAACAATG AATGAGGAGA CTGAACGGCG CAAAATTGTT 2220 

AGTTTAGTTG CTCTTTTTGG CGGCCGGCGA TAATGTTCTT CACTTGGTAT TCTTACCAGG 2280 

ATTGAGCCTG ATTTTGTTTT GTCTTA 2306 



10 



15 



25 



40 



55 



(2) INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 479 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asn Val Thr Thr Pro Glu Val Ala Phe Arg Glu Tyr Gin Thr Asn 
15 10 15 



Cys Leu Ala Ser Tyr He Ser Ala Asp Pro Asp He Thr Pro Ser Asn 
30 20 2 5 30 

Leu He Leu Gin Gly Tyr Ser Gly Thr Gly Lys Thr Tyr Thr Leu Lys 
35 40 45 

35 Lys Tyr Phe Asn Ala Asn Pro Asn Leu His Ala Val Trp Leu Glu Pro 

50 55 60 



Val Glu Leu Val Ser Trp Lys Pro Leu Leu Gin Ala He Ala Arg Thr 
65 70 75 80 

Val Gin Tyr Lys Leu Lys Thr Leu Tyr Pro Asn He Pro Thr Thr Asp 
85 90 95 



Tyr. Asp Pro Leu Gin Val Glu Glu Pro Phe Leu Leu Val Lys Thr Leu 
45 100 105 110 

His Asn He Phe Val Gin Tyr Glu Ser Leu Gin Glu Lys Thr Cys Leu 
115 120 125 

50 Phe Leu He Leu Asp Gly Phe Asp Ser Leu Gin Asp Leu Asp Ala Ala 

130 135 140 



Leu Phe Asn Lys Tyr He Lys Leu Asn Glu Leu Leu Pro Lys Asp Ser 
145 150 155 160 

Lys He Asn He Lys Phe He Tyr Thr Met Leu Glu Thr Ser Phe Leu 
165 170 - 175 



Gin Arg Tyr Ser Thr His Cys He Pro Thr Val Met Phe Pro Arg Tyr 

60 180 185 190 

Asn Val Asp Glu Val Ser Thr He Leu Val Met Ser Arg Cys Gly Glu 
195 200 205 

65 Leu Met Glu Asp Ser Cys Leu Arg Lys Arg He He Glu Glu Gin He 

210 215 220 
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Thr Asp Cys Thr Asp Asp Gin Phe Gin Asn Val Ala Ala Asn Phe He 

225 230 235 240 

His Leu He Val Gin Ala Phe His Ser Tyr Thr Gly Asn Asp lie Phe 

5 245 . 250 255 

Ala Leu Asn Asp Leu He Asp Phe Lys Trp Pro Lys Tyr Val Ser Arg 

260 265 270 

10 He Thr Lys Glu Asn He Phe Glu Pro Leu Ala Leu Tyr Lys Ser Ala 

275 280 285 



15 



30 



45 



He Lys Leu Phe Leu Ser Thr Asp Asp Asn Leu Ser Glu Asn Gly Gin 

290 295 300 

Gly Glu Ser Ala He Thr Thr Asn Arg Asp Asp Leu Glu Asn Ser Gin 

305 . 310 315 320 



Thr Tyr Asp Leu Ser He He Ser Lys Tyr Leu Leu He Ala Ser Tyr 
20 325 330 335 

He Cys Ser Tyr Leu Glu Pro Arg Tyr Asp Ala Ser He Phe Ser Arg 
340 345 350 

25 Lys Thr Arg He He Gin Gly Arg Ala Ala Tyr Gly Arg Arg Lys Lys 

355 360 365 



Lys Glu Val Asn Pro Arg Tyr Leu Gin Pro Ser Leu Phe Ala He Glu 

370 375 380 

Arg Leu Leu Ala He Phe Gin Ala He Phe Pro He Gin Gly Lys Ala 

385 390 395 400 



Glu Ser Gly Ser Leu Ser Ala Leu Arg Glu Glu Ser Leu Met Lys Ala 

35 405 410 415 

Asn He Glu Val Phe Gin Asn Leu Ser Glu Leu His Thr Leu Lys Leu 

420 425 430 

40 He Ala Thr Thr Met Asn Lys Asn He Asp Tyr Leu Ser Pro Lys Val 

435 440 445 



Arg Trp Lys Val Asn Val Pro Trp Glu He He Lys Glu He Ser Glu 

450 455 460 

Ser Val His Phe Asn He Ser Asp Tyr Phe Ser Asp He His Glu 

465 470 475 



50 (2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1975 base pairs 

(B) TYPE: nucleic acid 
55 (C) STRANDEDNESS: double 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

60 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 443.. 1747 



65 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CGTGTGCTCT TCTATAGTAA TTTGACATTC TCTAAACGCA GAGACGTCTT ATAAAGATTC 60 
AACAAATAAG GAATGTTACC TATGCTAGTC GCAACTCTCT CGTAAGTTGA GGGTTGCTAA 120 
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10 



35 



55 



CAGAAAAACG ATGAGAAGAA ACTTTTGAAA AATATTGTGT GAAAGCAGCA CGAAACAGAG 180 

TATGAAAAAA GAATGCGGGC GTCCGTAAAG AGCTAGAATC GCAAGTGTCC AGAATATGCA 240 

AGGCTTTCGA ATACACTCCT CACGCTTCTC TTCAGCAAAA ATCAACTCTT TGTGATAAAA 300 

CTGTGTATTT CTTTGTTCTT TGCCGTTGTT TACGTTAGTA AGAAATCGGC ATTGAAAAAA 360 

AAAATCTCAC ACTAAAATTG CAGAAAAAAG TGTACAATAT CAGTAAATAA AATTGGCCAA 420 

AACAATACCA TTAAAACCAG TC ATG TCC ATG CAA CAA GTC CAA CAT TGT GTC 472 

Met Ser Met Gin Gin Val Gin His Cys Val 
15 10 



15 GCA GAA GTA CTT CGA CTA GAT CCA CAA GAA AAA CCG GAC TGG TCG AGC 520 
Ala Glu Val Leu Arg Leu Asp Pro Gin Glu Lys Pro Asp Trp Ser Ser 
15 20 25 

GGA TAT TTG AAG AAG TTG ACT AAT GCG ACA TCG ATT TTA TAT AAT ACT 568 
20 Gly Tyr Leu Lys Lys Leu Thr Asn Ala Thr Ser lie Leu Tyr Asn Thr 
30 35 40 

TCA CTG AAC AAG GTA ATG CTG AAA CAA GAT GAA GAG GTT GCT AG A TGT 616 
Ser Leu Asn Lys Val Met Leu Lys Gin Asp Glu Glu Val Ala Arg Cys 
25 45 SO 55 

CAC ATA TGT GCA TAC ATA GCG TCA CAG AAA ATG AAT GAA AAA CAC ATG 664 

His lie Cys Ala Tyr lie Ala Ser Gin Lys Met Asn Glu Lys His Met 
60 65 70 

30 

CCT GAC CTT TGC TAT TAT ATA GAC AGT ATT CCC TTG GAG CCG AAA AAA 712 

Pro Asp Leu Cys Tyr Tyr lie Asp Ser lie Pro Leu Glu Pro Lys Lys 
75 80 85 90 

GCC AAG CAT TTA ATG AAC CTT TTC AGA CAA AGT TTA TCT AAT TCT TCA 760 
Ala Lys His Leu Met Asn Leu Phe Arg Gin Ser Leu Ser Asn Ser Ser 
95 100 105 

40 CCT ATG AAA CAA TTT GCT TGG ACA CCG AGC CCC AAA AAG AAC AAA CGC 808 
Pro Met Lys Gin Phe Ala Trp Thr Pro Ser Pro Lys Lys Asn Lys Arg 
110 115 120 

AGT CCA GTA AAG AAC GGT GGG AGG TTT ACT TCT TCT GAT CCG AAA GAG 856 
45 Ser Pro Val Lys Asn Gly Gly Arg Phe Thr Ser Ser Asp Pro Lys Glu 
125 130 135 

TTG AGG AAT CAA CTG TTT GGT ACA CCA ACT AAA GTT AGG AAA AGC CAA 904 
Leu Arg Asn Gin Leu Phe Gly Thr Pro Thr Lys Val Arg Lys Ser Gin 
50 140 145 150 

AAT AAT GAT TCG TTC GTA ATA CCA GAA CTA CCC CCC ATG CAA ACC AAT 952 
Asn Asn Asp Ser Phe Val lie Pro Glu Leu Pro Pro Met Gin Thr Asn 
155 160 165 170 



GAA TCG CCG TCT ATT ACT AGG AGA AAG TTA GCA TTT GAA GAG GAT GAG 1000 
Glu Ser Pro Ser lie Thr Arg Arg Lys Leu Ala Phe Glu Glu Asp Glu 
175 180 185 



60 GAT GAG GAT GAA GAG GAA CCA GGA AAC GAC GGT TTG TCT TTA AAA AGC 1048 
Asp Glu Asp Glu Glu Glu Pro Gly Asn Asp Gly Leu Ser Leu Lys Ser 
190 195 200 

CAT AGT AAT AAG AGC ATT ACT GGA ACC AGA AAT GTA GAT TCT GAT GAG 1096 
65 His Ser Asn Lys Ser He Thr Gly Thr Arg Asn Val Asp Ser Asp Glu 
205 210 215 

TAT GAA AAC CAT GAA AGT GAC CCT ACA AGT GAG GAA GAG CCA TTA GGT 1144 
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Tyr Glu Asn His Glu Ser Asp Pro Thr Ser Glu Glu Glu Pro Leu Gly 
220 225 230 

GTG CAA GAA AGC AGA AGO GGG AGA ACG AAA CAA AAT AAG GCA GTT GGA 1192 
5 Val Gin Glu Ser Arg Ser Gly Arg Thr Lys Gin Asn Lys Ala Val Gly 
235 240 245 250 

AAA CCG CAA TCA GAA TTG AAG ACG GCA AAA GCC CTG AGG AAA AGG GGC 1240 
Lys Pro Gin Ser Glu Leu Lys Thr Ala Lys Ala Leu Arg Lys Arg Gly 
10 255 260 265 

AGA ATA CCA AAT TCT TTG TTA GTA AAG AAG TAT TGC AAA ATG ACT ACT 1288 
Arg lie Pro Asn Ser Leu Leu Val Lys Lys Tyr Cys Lys Met Thr Thr 
270 275 280 



15 



35 



GAA GAA ATA ATA CGG CTT TGC AAC GAT TTT GAA TTA CCA AGA GAA GTA 1336 
Glu Glu lie lie Arg Leu Cys Asn Asp Phe Glu Leu Pro Arg Glu Val 
285 290 295 



20 GCA TAT AAA ATT GTG GAT GAG TAG AAC ATA AAC GCG TCA AGA TTG GTT 1384 
Ala Tyr Lys lie Val Asp Glu Tyr Asn lie Asn Ala Ser Arg Leu Val 
300 305 310 

TGC CCA TGG CAA TTA GTG TGT GGG TTA GTA TTA AAT TGT ACA TTC ATT 1432 
25 Cys Pro Trp Gin Leu Val Cys Gly Leu Val Leu Asn Cys Thr Phe lie 
315 320 325 330 

GTA TTT AAT GAA AGA AGA CGC AAG GAT CCA AGA ATT GAC CAT TTT ATA 1480 
Val Phe Asn Glu Arg Arg Arg Lys Asp Pro Arg lie Asp His Phe lie 
30 335 340 345 

GTC AGT AAG ATG TGC AGC TTG ATG TTG ACG TCA AAA GTG GAT GAT GTT 1528 
Val Ser Lys Met Cys Ser Leu Met Leu Thr Ser Lys Val Asp Asp Val 
350 355 360 



ATT GAA TGT GTA AAA TTA GTG AAG GAA TTA ATT ATC GGT GAA AAA TGG 1576 
lie Glu Cys Val Lys Leu Val Lys Glu Leu lie lie Gly Glu Lys Trp 
365 370 375 



40 TTC AGA GAT TTG CAA ATT AGG TAT GAT GAT TTT GAT GGC ATC AGA TAC 1624 
Phe Arg Asp Leu Gin He Arg Tyr Asp Asp Phe Asp Gly He Arg Tyr 
380 385 390 

GAT GAA ATT ATA TTT AGG AAA CTG GGA TCG ATG TTA CAA ACC ACC AAT 1672 
45 Asp Glu He He Phe Arg Lys Leu Gly Ser Met Leu Gin Thr Thr Asn 
395 400 405 410 

ATT TTG GTC ACA GAC GAC CAG TAC AAT ATT TGG AAG AAA AGA ATT GAA 1720 
He Leu Val Thr Asp Asp Gin Tyr Asn He Trp Lys Lys Arg He Glu 
50 415 420 425 

ATG GAT TTG GCA TTA ACA GAA CCT TTA TAACATATCC AGTATTAACT 1767 
Met Asp Leu Ala Leu Thr Glu Pro Leu 
430 435 

55 

AAAAGTATAT ATTTGACCAA TACCTGACAT ATCTTCTAAA GCATGCCTTT AGCCCTATAA 1827 
CGAGCTAATG TTAGCTCCAT CTTTGCACTT ATGATTGGAT CAGCCCTCAA ACGCTTTTGT 1887 
60 ATCTTTGCAG CTTCCGCGAA GGTAGTAGCT TGAAGTTTTT CATCCATAGT TCTTGCTAAA 1947 
ATTGCAGAAT CTTCAAACAA TTCTATGG 1975 

65 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 435 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

^ (ii) MOLECULE XIPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Ser Met Gin Gin Val Gin His Cys Val Ala Glu Val Leu Arg Leu 

10 ' 

Aap Pro Gin Glu Lys Pro Asp Trp Ser Ser Gly Tyr Leu Lys Lys Leu 
20 25 30 

Thr Asn Ala Thr Ser lie Leu Tyr Asn Thr Ser Leu Asn Lys Val Met 
15 35 40 45 

Leu Lys Gin Asp Glu Glu Val Ala Arg Cys His lie Cys Ala Tyr lie 
50 55 60 

20 Ala Ser Gin Lys Met Asn Glu Lys His Met Pro Asp Leu Cys Tyr Tyr 
65 70 75 80 

lie Asp Ser lie Pro Leu Glu Pro Lys Lys Ala Lys His Leu Met Asn 

25 

Leu Phe Arg Gin Ser Leu Ser Asn Ser Ser Pro Met Lys Gin Phe Ala 
100 105 110 

Trp Thr Pro Ser Pro Lys Lys Asn Lys Arg Ser Pro Val Lys Asn Gly 
30 115 120 125 

Gly Arg Phe Thr Ser Ser Asp Pro Lys Glu Leu Arg Asn Gin Leu Phe 
130 135 140 

35 

Gly Thr Pro Thr Lys Val Arg Lys Ser Gin Asn Asn Asp Ser Phe Val 
145 150 . 155 160 

lie Pro Glu Leu Pro Pro Met Gin Thr Asn Glu Ser Pro Ser lie Thr 
40 165 170 175 

Arg Arg Lys Leu Ala Phe Glu Glu Asp Glu Asp Glu Asp Glu Glu Glu 
180 185 190 

45 Pro Gly Asn Asp Gly Leu Ser Leu Lys Ser His Ser Asn Lys Servile 
195 200 205 

Thr Gly Thr Arg Asn Val Asp Ser Asp Glu Tyr Glu Asn His Glu Ser 
210 215 220 

Asp Pro Thr Ser Glu Glu Glu Pro Leu Gly Val Gin Glu Ser Arg Ser 
225 230 235 240 

Gly Arg Thr Lys Gin Asn Lys Ala Val Gly Lys Pro Gin Ser Glu Leu 
55 245 250 255 

Lys Thr Ala Lys Ala Leu Arg Lys Arg Gly Arg lie Pro Asn Ser Leu 
260 265 270 

60 Leu Val Lys Lys Tyr Cys Lys Met Thr Thr Glu Glu lie lie Arg Leu 
275 280 285 

Cys Asn Asp Phe Glu Leu Pro Arg Glu Val Ala Tyr Lys lie Val Asp 
^. 290 295 300 

65 

Glu Tyr Asn lie Asn Ala Ser Arg Leu Val Cys Pro Trp Gin Leu Val 
305 310 315 320 
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Cys Gly Leu Val Leu Asn Cys Thr Phe lie Val Phe Asn Glu Arg Arg 
325 330 335 

Arg Lys Asp Pro Arg ITe Asp His Phe lie Val Ser Lys Met Cys Ser 
5 340 345 350 

Leu Met Leu Thr Ser Lys Val Asp Asp Val lie Glu Cys Val Lys Leu 
355 360 365 

10 Val Lys Glu Leu lie lie Gly Glu Lys Trp Phe Arg Asp Leu Gin He 
370 375 380 

Arg Tyr Asp Asp Phe Asp Gly He Arg Tyr Asp Glu He He Phe Arg 
385 390 395 400 

15 

Lys Leu Gly Ser Met Leu Gin Thr Thr Asn He Leu Val Thr Asp Asp 
405 410 415 

Gin Tyr Asn He Trp Lys Lys Arg He Glu Met Asp Leu Ala Leu Thr 
20 420 425 430 

Glu Pro Leu 
435 
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WHAT IS CLAIMED IS : 

1. A composition comprising an isolated nucleic acid encoding a 
biologically active unique ponion of an ORC polypeptide. 

5 

2. A composition according to claim 1, wherein said ORC gene is 

ORCl. 

3. A composition according to claim 1, wherein said ORC gene is 

10 0RC2. 

4. A composition according to claim 1, wherein said ORC gene is 

0RC3. 

15 5. A composition according to claim 1, wherein said ORC gene is 

0RC4. 

6. A composition according to claim 1, wherein said ORC gene is 

0RC5. 

20 

7. A composition according to claim 1, wherein said ORC gene is ■ 

0RC6. 

8. A composition comprising a recombinant, biologically active unique 
25 portion of an ORC protein. 

9. A method of identifying an ORC selective agent, said method 
comprising the steps of: 

contacting an agent with a composition according to claim 8; 
30 measuring in at least qualitative terms the binding affinity of said agent for 

said composition. 
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10. A method for identifying a gene encoding a protein which directly 
or indirectly associates with a selected DNA sequence, said method comprising the 
steps of: 

transforming an expression library of hybrid proteins into a reporter strain, 
5 wherein said library comprises protein-coding sequences fused to a constitutively 
expressed transcription activation domain and said reporter strain comprises a 
reporter gene with at least one copy of a selected DNA sequence in its promoter 
region; 

detecting the transcription or translation product of said reporter gene in a 
10 clone of said reporter strain; 

recovering said clone; 

whereby said clone comprises a gene encoding a protein which directly or 
indirectly associates with said selected DNA sequence. 
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